Background

Open-Meteo maintains an API for historical weather that allows for non-commercial usage of historical weather data maintained by the website.

This file builds on _v001, _v002, _v003, and _v004 to run exploratory analysis on some historical weather data.

Functions and Libraries

The exploration process uses tidyverse, ranger, several generic custom functions, and several functions specific to Open Meteo processing. First, tidyverse, ranger, and the generic functions are loaded:

library(tidyverse) # tidyverse functionality is included throughout
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ranger) # predict() does not work on ranger objects unless ranger has been called

source("./Generic_Added_Utility_Functions_202105_v001.R") # Basic functions

Next, specific functions written in _v001, _v002, _v003, and _v004 are sourced:

source("./SimpleOneVar_Functions_202411_v001.R") # Functions for basic single variable analysis
source("./OpenMeteo_NextBest_202411_v001.R") # Functions for finding 'next best' predictor given existing model
source("./OpenMeteo_Functions_202411_v001.R") # Core functions for loading, processing, analysis of Open Meteo

Key mapping tables for available metrics are also copied:

hourlyMetrics <- "temperature_2m,relativehumidity_2m,dewpoint_2m,apparent_temperature,pressure_msl,surface_pressure,precipitation,rain,snowfall,cloudcover,cloudcover_low,cloudcover_mid,cloudcover_high,shortwave_radiation,direct_radiation,direct_normal_irradiance,diffuse_radiation,windspeed_10m,windspeed_100m,winddirection_10m,winddirection_100m,windgusts_10m,et0_fao_evapotranspiration,weathercode,vapor_pressure_deficit,soil_temperature_0_to_7cm,soil_temperature_7_to_28cm,soil_temperature_28_to_100cm,soil_temperature_100_to_255cm,soil_moisture_0_to_7cm,soil_moisture_7_to_28cm,soil_moisture_28_to_100cm,soil_moisture_100_to_255cm"
dailyMetrics <- "weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration"

hourlyDescription <- "Air temperature at 2 meters above ground\nRelative humidity at 2 meters above ground\nDew point temperature at 2 meters above ground\nApparent temperature is the perceived feels-like temperature combining wind chill factor, relative humidity and solar radiation\nAtmospheric air pressure reduced to mean sea level (msl) or pressure at surface. Typically pressure on mean sea level is used in meteorology. Surface pressure gets lower with increasing elevation.\nAtmospheric air pressure reduced to mean sea level (msl) or pressure at surface. Typically pressure on mean sea level is used in meteorology. Surface pressure gets lower with increasing elevation.\nTotal precipitation (rain, showers, snow) sum of the preceding hour. Data is stored with a 0.1 mm precision. If precipitation data is summed up to monthly sums, there might be small inconsistencies with the total precipitation amount.\nOnly liquid precipitation of the preceding hour including local showers and rain from large scale systems.\nSnowfall amount of the preceding hour in centimeters. For the water equivalent in millimeter, divide by 7. E.g. 7 cm snow = 10 mm precipitation water equivalent\nTotal cloud cover as an area fraction\nLow level clouds and fog up to 2 km altitude\nMid level clouds from 2 to 6 km altitude\nHigh level clouds from 6 km altitude\nShortwave solar radiation as average of the preceding hour. This is equal to the total global horizontal irradiation\nDirect solar radiation as average of the preceding hour on the horizontal plane and the normal plane (perpendicular to the sun)\nDirect solar radiation as average of the preceding hour on the horizontal plane and the normal plane (perpendicular to the sun)\nDiffuse solar radiation as average of the preceding hour\nWind speed at 10 or 100 meters above ground. Wind speed on 10 meters is the standard level.\nWind speed at 10 or 100 meters above ground. Wind speed on 10 meters is the standard level.\nWind direction at 10 or 100 meters above ground\nWind direction at 10 or 100 meters above ground\nGusts at 10 meters above ground of the indicated hour. Wind gusts in CERRA are defined as the maximum wind gusts of the preceding hour. Please consult the ECMWF IFS documentation for more information on how wind gusts are parameterized in weather models.\nET0 Reference Evapotranspiration of a well watered grass field. Based on FAO-56 Penman-Monteith equations ET0 is calculated from temperature, wind speed, humidity and solar radiation. Unlimited soil water is assumed. ET0 is commonly used to estimate the required irrigation for plants.\nWeather condition as a numeric code. Follow WMO weather interpretation codes. See table below for details. Weather code is calculated from cloud cover analysis, precipitation and snowfall. As barely no information about atmospheric stability is available, estimation about thunderstorms is not possible.\nVapor Pressure Deificit (VPD) in kilopascal (kPa). For high VPD (>1.6), water transpiration of plants increases. For low VPD (<0.4), transpiration decreases\nAverage temperature of different soil levels below ground.\nAverage temperature of different soil levels below ground.\nAverage temperature of different soil levels below ground.\nAverage temperature of different soil levels below ground.\nAverage soil water content as volumetric mixing ratio at 0-7, 7-28, 28-100 and 100-255 cm depths.\nAverage soil water content as volumetric mixing ratio at 0-7, 7-28, 28-100 and 100-255 cm depths.\nAverage soil water content as volumetric mixing ratio at 0-7, 7-28, 28-100 and 100-255 cm depths.\nAverage soil water content as volumetric mixing ratio at 0-7, 7-28, 28-100 and 100-255 cm depths."
dailyDescription <- "The most severe weather condition on a given day\nMaximum and minimum daily air temperature at 2 meters above ground\nMaximum and minimum daily air temperature at 2 meters above ground\nMaximum and minimum daily apparent temperature\nMaximum and minimum daily apparent temperature\nSum of daily precipitation (including rain, showers and snowfall)\nSum of daily rain\nSum of daily snowfall\nThe number of hours with rain\nSun rise and set times\nSun rise and set times\nMaximum wind speed and gusts on a day\nMaximum wind speed and gusts on a day\nDominant wind direction\nThe sum of solar radiaion on a given day in Megajoules\nDaily sum of ET0 Reference Evapotranspiration of a well watered grass field"

# Create tibble for hourly metrics
tblMetricsHourly <- tibble::tibble(metric=hourlyMetrics %>% str_split_1(","), 
                                   description=hourlyDescription %>% str_split_1("\n")
                                   )
tblMetricsHourly %>% 
    print(n=50)
## # A tibble: 33 × 2
##    metric                        description                                    
##    <chr>                         <chr>                                          
##  1 temperature_2m                Air temperature at 2 meters above ground       
##  2 relativehumidity_2m           Relative humidity at 2 meters above ground     
##  3 dewpoint_2m                   Dew point temperature at 2 meters above ground 
##  4 apparent_temperature          Apparent temperature is the perceived feels-li…
##  5 pressure_msl                  Atmospheric air pressure reduced to mean sea l…
##  6 surface_pressure              Atmospheric air pressure reduced to mean sea l…
##  7 precipitation                 Total precipitation (rain, showers, snow) sum …
##  8 rain                          Only liquid precipitation of the preceding hou…
##  9 snowfall                      Snowfall amount of the preceding hour in centi…
## 10 cloudcover                    Total cloud cover as an area fraction          
## 11 cloudcover_low                Low level clouds and fog up to 2 km altitude   
## 12 cloudcover_mid                Mid level clouds from 2 to 6 km altitude       
## 13 cloudcover_high               High level clouds from 6 km altitude           
## 14 shortwave_radiation           Shortwave solar radiation as average of the pr…
## 15 direct_radiation              Direct solar radiation as average of the prece…
## 16 direct_normal_irradiance      Direct solar radiation as average of the prece…
## 17 diffuse_radiation             Diffuse solar radiation as average of the prec…
## 18 windspeed_10m                 Wind speed at 10 or 100 meters above ground. W…
## 19 windspeed_100m                Wind speed at 10 or 100 meters above ground. W…
## 20 winddirection_10m             Wind direction at 10 or 100 meters above ground
## 21 winddirection_100m            Wind direction at 10 or 100 meters above ground
## 22 windgusts_10m                 Gusts at 10 meters above ground of the indicat…
## 23 et0_fao_evapotranspiration    ET0 Reference Evapotranspiration of a well wat…
## 24 weathercode                   Weather condition as a numeric code. Follow WM…
## 25 vapor_pressure_deficit        Vapor Pressure Deificit (VPD) in kilopascal (k…
## 26 soil_temperature_0_to_7cm     Average temperature of different soil levels b…
## 27 soil_temperature_7_to_28cm    Average temperature of different soil levels b…
## 28 soil_temperature_28_to_100cm  Average temperature of different soil levels b…
## 29 soil_temperature_100_to_255cm Average temperature of different soil levels b…
## 30 soil_moisture_0_to_7cm        Average soil water content as volumetric mixin…
## 31 soil_moisture_7_to_28cm       Average soil water content as volumetric mixin…
## 32 soil_moisture_28_to_100cm     Average soil water content as volumetric mixin…
## 33 soil_moisture_100_to_255cm    Average soil water content as volumetric mixin…
# Create tibble for daily metrics
tblMetricsDaily <- tibble::tibble(metric=dailyMetrics %>% str_split_1(","), 
                                  description=dailyDescription %>% str_split_1("\n")
                                   )
tblMetricsDaily
## # A tibble: 16 × 2
##    metric                     description                                       
##    <chr>                      <chr>                                             
##  1 weathercode                The most severe weather condition on a given day  
##  2 temperature_2m_max         Maximum and minimum daily air temperature at 2 me…
##  3 temperature_2m_min         Maximum and minimum daily air temperature at 2 me…
##  4 apparent_temperature_max   Maximum and minimum daily apparent temperature    
##  5 apparent_temperature_min   Maximum and minimum daily apparent temperature    
##  6 precipitation_sum          Sum of daily precipitation (including rain, showe…
##  7 rain_sum                   Sum of daily rain                                 
##  8 snowfall_sum               Sum of daily snowfall                             
##  9 precipitation_hours        The number of hours with rain                     
## 10 sunrise                    Sun rise and set times                            
## 11 sunset                     Sun rise and set times                            
## 12 windspeed_10m_max          Maximum wind speed and gusts on a day             
## 13 windgusts_10m_max          Maximum wind speed and gusts on a day             
## 14 winddirection_10m_dominant Dominant wind direction                           
## 15 shortwave_radiation_sum    The sum of solar radiaion on a given day in Megaj…
## 16 et0_fao_evapotranspiration Daily sum of ET0 Reference Evapotranspiration of …

A previously existing dataset is loaded, with key analysis variables defined in a vector:

# Load previous data
allCity <- readFromRDS("allCity_20241116")

# Get core training variables
varsTrain <- getVarsTrain(allCity)
varsTrain
##  [1] "hour"                          "temperature_2m"               
##  [3] "relativehumidity_2m"           "dewpoint_2m"                  
##  [5] "apparent_temperature"          "pressure_msl"                 
##  [7] "surface_pressure"              "precipitation"                
##  [9] "rain"                          "snowfall"                     
## [11] "cloudcover"                    "cloudcover_low"               
## [13] "cloudcover_mid"                "cloudcover_high"              
## [15] "shortwave_radiation"           "direct_radiation"             
## [17] "direct_normal_irradiance"      "diffuse_radiation"            
## [19] "windspeed_10m"                 "windspeed_100m"               
## [21] "winddirection_10m"             "winddirection_100m"           
## [23] "windgusts_10m"                 "et0_fao_evapotranspiration"   
## [25] "weathercode"                   "vapor_pressure_deficit"       
## [27] "soil_temperature_0_to_7cm"     "soil_temperature_7_to_28cm"   
## [29] "soil_temperature_28_to_100cm"  "soil_temperature_100_to_255cm"
## [31] "soil_moisture_0_to_7cm"        "soil_moisture_7_to_28cm"      
## [33] "soil_moisture_28_to_100cm"     "soil_moisture_100_to_255cm"   
## [35] "year"                          "doy"
# Assign default label
keyLabel <- genericKeyLabelOM()
keyLabel
## [1] "predictions based on pre-2022 training data applied to 2022 holdout dataset"

The correlation heatmap is reproduced, with functions that borrowing from the recipe at STHDA:

# Default function
corVarsTrain <- makeHeatMap(allCity, vecSelect=varsTrain, returnData=TRUE)

corVarsTrain %>% filter(Var1!=Var2) %>% arrange(desc(value)) %>% print(n=20)
## # A tibble: 630 × 3
##    Var1                          Var2                         value
##    <fct>                         <fct>                        <dbl>
##  1 rain                          precipitation                0.989
##  2 apparent_temperature          temperature_2m               0.984
##  3 direct_radiation              shortwave_radiation          0.974
##  4 temperature_2m                soil_temperature_0_to_7cm    0.962
##  5 soil_temperature_28_to_100cm  soil_temperature_7_to_28cm   0.952
##  6 shortwave_radiation           et0_fao_evapotranspiration   0.944
##  7 apparent_temperature          soil_temperature_0_to_7cm    0.942
##  8 soil_moisture_7_to_28cm       soil_moisture_0_to_7cm       0.940
##  9 windspeed_100m                windspeed_10m                0.940
## 10 direct_radiation              et0_fao_evapotranspiration   0.928
## 11 direct_radiation              direct_normal_irradiance     0.922
## 12 soil_moisture_28_to_100cm     soil_moisture_7_to_28cm      0.922
## 13 soil_temperature_0_to_7cm     soil_temperature_7_to_28cm   0.919
## 14 apparent_temperature          soil_temperature_7_to_28cm   0.918
## 15 temperature_2m                soil_temperature_7_to_28cm   0.917
## 16 shortwave_radiation           direct_normal_irradiance     0.901
## 17 windspeed_10m                 windgusts_10m                0.884
## 18 soil_temperature_100_to_255cm soil_temperature_28_to_100cm 0.863
## 19 soil_moisture_100_to_255cm    soil_moisture_28_to_100cm    0.859
## 20 apparent_temperature          soil_temperature_28_to_100cm 0.853
## # ℹ 610 more rows

The correlation heatmap is produced for a single city:

corVarsTrainBOS <- makeHeatMap(allCity %>% filter(src=="Boston"), vecSelect=varsTrain, returnData=TRUE)

corVarsTrainBOS %>% filter(Var1!=Var2) %>% arrange(desc(value)) %>% print(n=20)
## # A tibble: 630 × 3
##    Var1                         Var2                         value
##    <fct>                        <fct>                        <dbl>
##  1 surface_pressure             pressure_msl                 1.00 
##  2 apparent_temperature         temperature_2m               0.993
##  3 temperature_2m               soil_temperature_0_to_7cm    0.965
##  4 apparent_temperature         soil_temperature_0_to_7cm    0.965
##  5 soil_temperature_0_to_7cm    soil_temperature_7_to_28cm   0.965
##  6 direct_radiation             shortwave_radiation          0.962
##  7 rain                         precipitation                0.959
##  8 soil_temperature_28_to_100cm soil_temperature_7_to_28cm   0.950
##  9 windspeed_100m               windspeed_10m                0.946
## 10 apparent_temperature         dewpoint_2m                  0.937
## 11 shortwave_radiation          et0_fao_evapotranspiration   0.935
## 12 windspeed_10m                windgusts_10m                0.927
## 13 apparent_temperature         soil_temperature_7_to_28cm   0.925
## 14 temperature_2m               dewpoint_2m                  0.917
## 15 direct_radiation             direct_normal_irradiance     0.916
## 16 soil_moisture_7_to_28cm      soil_moisture_0_to_7cm       0.915
## 17 temperature_2m               soil_temperature_7_to_28cm   0.914
## 18 direct_radiation             et0_fao_evapotranspiration   0.902
## 19 winddirection_100m           winddirection_10m            0.895
## 20 soil_temperature_0_to_7cm    soil_temperature_28_to_100cm 0.886
## # ℹ 610 more rows

Some variables, such as surface pressure vs. MSL pressure, are much differently correlated if controlling for city. Differences are explored:

tstCorDelta <- corVarsTrain %>% 
    mutate(v1=pmin(as.character(Var1), as.character(Var2)), 
           v2=pmax(as.character(Var1), as.character(Var2))
           ) %>% 
    select(v1, v2, value_all=value) %>%
    full_join(corVarsTrainBOS %>% 
                  mutate(v1=pmin(as.character(Var1), as.character(Var2)), 
                         v2=pmax(as.character(Var1), as.character(Var2))
                         ) %>% 
                  select(v1, v2, value_bos=value), 
              by=c("v1", "v2")
              ) %>%
    mutate(delta=value_bos-value_all)

tstCorDelta
## # A tibble: 666 × 5
##    v1                   v2                            value_all value_bos  delta
##    <chr>                <chr>                             <dbl>     <dbl>  <dbl>
##  1 dewpoint_2m          dewpoint_2m                     1          1      0     
##  2 dewpoint_2m          soil_temperature_7_to_28cm      0.675      0.874  0.199 
##  3 dewpoint_2m          soil_temperature_28_to_100cm    0.630      0.807  0.177 
##  4 dewpoint_2m          soil_temperature_0_to_7cm       0.623      0.875  0.252 
##  5 dewpoint_2m          temperature_2m                  0.691      0.917  0.226 
##  6 apparent_temperature dewpoint_2m                     0.785      0.937  0.152 
##  7 dewpoint_2m          soil_temperature_100_to_255cm   0.448      0.505  0.0567
##  8 dewpoint_2m          doy                             0.185      0.312  0.127 
##  9 dewpoint_2m          hour                            0.00764    0.0203 0.0127
## 10 dewpoint_2m          vapor_pressure_deficit         -0.0678     0.305  0.373 
## # ℹ 656 more rows
tstCorDelta %>% arrange(delta) %>% head(20)
## # A tibble: 20 × 5
##    v1                         v2                      value_all value_bos  delta
##    <chr>                      <chr>                       <dbl>     <dbl>  <dbl>
##  1 relativehumidity_2m        surface_pressure           0.578   -0.160   -0.738
##  2 soil_moisture_0_to_7cm     surface_pressure           0.594   -0.0914  -0.685
##  3 soil_moisture_7_to_28cm    surface_pressure           0.586   -0.0240  -0.610
##  4 dewpoint_2m                surface_pressure           0.364   -0.237   -0.601
##  5 soil_moisture_100_to_255cm surface_pressure           0.602    0.00609 -0.596
##  6 soil_moisture_28_to_100cm  surface_pressure           0.588    0.0215  -0.566
##  7 surface_pressure           windspeed_100m             0.247   -0.311   -0.558
##  8 dewpoint_2m                soil_moisture_28_to_10…    0.0416  -0.508   -0.549
##  9 surface_pressure           windspeed_10m              0.195   -0.304   -0.498
## 10 dewpoint_2m                soil_moisture_7_to_28cm    0.0435  -0.448   -0.491
## 11 surface_pressure           windgusts_10m              0.133   -0.332   -0.465
## 12 dewpoint_2m                soil_moisture_0_to_7cm     0.0722  -0.393   -0.465
## 13 relativehumidity_2m        soil_moisture_28_to_10…    0.401   -0.0532  -0.454
## 14 relativehumidity_2m        soil_moisture_100_to_2…    0.374   -0.0545  -0.429
## 15 cloudcover                 surface_pressure           0.279   -0.144   -0.423
## 16 relativehumidity_2m        soil_moisture_7_to_28cm    0.441    0.0357  -0.406
## 17 soil_moisture_0_to_7cm     soil_moisture_100_to_2…    0.796    0.402   -0.393
## 18 cloudcover_low             surface_pressure           0.197   -0.186   -0.383
## 19 surface_pressure           weathercode                0.140   -0.240   -0.380
## 20 cloudcover_mid             surface_pressure           0.142   -0.221   -0.362
tstCorDelta %>% arrange(desc(delta)) %>% head(20)
## # A tibble: 20 × 5
##    v1                         v2                       value_all value_bos delta
##    <chr>                      <chr>                        <dbl>     <dbl> <dbl>
##  1 soil_moisture_100_to_255cm year                        0.0387   0.739   0.701
##  2 pressure_msl               surface_pressure            0.401    1.00    0.599
##  3 soil_moisture_28_to_100cm  year                       -0.0677   0.386   0.454
##  4 soil_moisture_7_to_28cm    year                       -0.0426   0.406   0.449
##  5 surface_pressure           vapor_pressure_deficit     -0.493   -0.0499  0.443
##  6 dewpoint_2m                vapor_pressure_deficit     -0.0678   0.305   0.373
##  7 soil_moisture_100_to_255cm vapor_pressure_deficit     -0.337   -0.00259 0.334
##  8 soil_moisture_0_to_7cm     year                        0.0110   0.334   0.323
##  9 doy                        soil_temperature_100_to…    0.453    0.764   0.311
## 10 relativehumidity_2m        soil_temperature_0_to_7…   -0.253    0.00738 0.260
## 11 dewpoint_2m                soil_temperature_0_to_7…    0.623    0.875   0.252
## 12 relativehumidity_2m        soil_temperature_28_to_…   -0.133    0.116   0.249
## 13 relativehumidity_2m        temperature_2m             -0.213    0.0356  0.249
## 14 relativehumidity_2m        soil_temperature_7_to_2…   -0.139    0.109   0.247
## 15 relativehumidity_2m        soil_temperature_100_to…   -0.120    0.123   0.243
## 16 pressure_msl               vapor_pressure_deficit     -0.292   -0.0521  0.240
## 17 dewpoint_2m                temperature_2m              0.691    0.917   0.226
## 18 cloudcover                 temperature_2m             -0.162    0.0534  0.215
## 19 direct_normal_irradiance   surface_pressure           -0.118    0.0838  0.201
## 20 dewpoint_2m                soil_temperature_7_to_2…    0.675    0.874   0.199

The process is repeated, using an aggregation of correlations in each of the seven cities:

keyCities <- allCity %>% pull(src) %>% unique()
keyCities
## [1] "NYC"     "LA"      "Chicago" "Houston" "Vegas"   "Miami"   "Boston"
corVarsTrainEach <- keyCities %>% 
    map_dfr(.f=function(x) makeHeatMap(allCity %>% filter(src==x), 
                                       vecSelect=varsTrain[!varsTrain %in% ifelse(x=="Miami", "snowfall", "")],
                                       returnData=TRUE, 
                                       plotMap=FALSE
                                       ) %>% 
                mutate(v1=pmin(as.character(Var1), as.character(Var2)), 
                       v2=pmax(as.character(Var1), as.character(Var2))
                       ) %>% 
                select(v1, v2, value), .id="src") %>%
    mutate(city=keyCities[as.integer(src)]) %>%
    select(src, city, everything())

corVarsTrainEach
## # A tibble: 4,626 × 5
##    src   city  v1                            v2                           value
##    <chr> <chr> <chr>                         <chr>                        <dbl>
##  1 1     NYC   soil_temperature_7_to_28cm    soil_temperature_7_to_28cm  1     
##  2 1     NYC   soil_temperature_28_to_100cm  soil_temperature_7_to_28cm  0.952 
##  3 1     NYC   dewpoint_2m                   soil_temperature_7_to_28cm  0.883 
##  4 1     NYC   soil_temperature_0_to_7cm     soil_temperature_7_to_28cm  0.954 
##  5 1     NYC   soil_temperature_7_to_28cm    temperature_2m              0.932 
##  6 1     NYC   apparent_temperature          soil_temperature_7_to_28cm  0.937 
##  7 1     NYC   soil_temperature_100_to_255cm soil_temperature_7_to_28cm  0.584 
##  8 1     NYC   doy                           soil_temperature_7_to_28cm  0.360 
##  9 1     NYC   cloudcover_mid                soil_temperature_7_to_28cm -0.100 
## 10 1     NYC   cloudcover_high               soil_temperature_7_to_28cm  0.0134
## # ℹ 4,616 more rows
corVarsTrainEach %>% 
    arrange(desc(value)) %>% 
    filter(v1!=v2) %>% 
    slice(1:15, (nrow(.)-14):nrow(.)) %>%
    print(n=30)
## # A tibble: 30 × 5
##    src   city    v1                         v2                             value
##    <chr> <chr>   <chr>                      <chr>                          <dbl>
##  1 6     Miami   precipitation              rain                           1    
##  2 6     Miami   pressure_msl               surface_pressure               1.00 
##  3 7     Boston  pressure_msl               surface_pressure               1.00 
##  4 2     LA      precipitation              rain                           1.00 
##  5 4     Houston pressure_msl               surface_pressure               1.00 
##  6 4     Houston precipitation              rain                           1.00 
##  7 1     NYC     pressure_msl               surface_pressure               1.00 
##  8 5     Vegas   precipitation              rain                           0.999
##  9 3     Chicago pressure_msl               surface_pressure               0.994
## 10 1     NYC     apparent_temperature       temperature_2m                 0.993
## 11 3     Chicago apparent_temperature       temperature_2m                 0.993
## 12 7     Boston  apparent_temperature       temperature_2m                 0.993
## 13 5     Vegas   direct_radiation           shortwave_radiation            0.989
## 14 5     Vegas   apparent_temperature       temperature_2m                 0.988
## 15 3     Chicago precipitation              rain                           0.988
## 16 1     NYC     soil_moisture_0_to_7cm     soil_temperature_7_to_28cm    -0.680
## 17 1     NYC     soil_moisture_7_to_28cm    soil_temperature_0_to_7cm     -0.681
## 18 1     NYC     soil_moisture_100_to_255cm soil_temperature_100_to_255cm -0.687
## 19 1     NYC     soil_moisture_28_to_100cm  soil_temperature_100_to_255cm -0.688
## 20 4     Houston et0_fao_evapotranspiration relativehumidity_2m           -0.689
## 21 1     NYC     soil_moisture_28_to_100cm  soil_temperature_7_to_28cm    -0.707
## 22 1     NYC     soil_moisture_7_to_28cm    soil_temperature_28_to_100cm  -0.715
## 23 1     NYC     soil_moisture_7_to_28cm    soil_temperature_7_to_28cm    -0.720
## 24 3     Chicago soil_moisture_100_to_255cm year                          -0.758
## 25 1     NYC     soil_moisture_28_to_100cm  soil_temperature_28_to_100cm  -0.777
## 26 4     Houston relativehumidity_2m        vapor_pressure_deficit        -0.791
## 27 2     LA      relativehumidity_2m        vapor_pressure_deficit        -0.818
## 28 5     Vegas   soil_moisture_28_to_100cm  year                          -0.828
## 29 5     Vegas   soil_moisture_7_to_28cm    year                          -0.831
## 30 6     Miami   relativehumidity_2m        vapor_pressure_deficit        -0.868

Comparison of aggregated individual city correlations to the overall correlations:

corVarsTrainDelta <- corVarsTrain %>% 
    mutate(v1=pmin(as.character(Var1), as.character(Var2)), 
           v2=pmax(as.character(Var1), as.character(Var2))
    ) %>% 
    select(v1, v2, value_all=value) %>% 
    full_join(corVarsTrainEach %>% 
                  group_by(v1, v2) %>% 
                  summarize(value=mean(value), .groups="drop"), 
              by=c("v1", "v2")) %>% 
    mutate(delta=value-value_all) 

corVarsTrainDelta %>% 
    arrange(delta) %>% 
    slice(1:15, (nrow(.)-14):nrow(.)) %>%
    print(n=30)
## # A tibble: 30 × 5
##    v1                         v2                       value_all    value  delta
##    <chr>                      <chr>                        <dbl>    <dbl>  <dbl>
##  1 relativehumidity_2m        surface_pressure            0.578  -0.178   -0.756
##  2 dewpoint_2m                surface_pressure            0.364  -0.371   -0.735
##  3 soil_moisture_100_to_255cm soil_moisture_7_to_28cm     0.824   0.156   -0.668
##  4 soil_moisture_0_to_7cm     soil_moisture_100_to_25…    0.796   0.138   -0.658
##  5 soil_moisture_0_to_7cm     surface_pressure            0.594  -0.0212  -0.615
##  6 soil_moisture_100_to_255cm surface_pressure            0.602  -0.00359 -0.606
##  7 soil_moisture_100_to_255cm soil_moisture_28_to_100…    0.859   0.283   -0.576
##  8 soil_moisture_7_to_28cm    surface_pressure            0.586   0.0143  -0.572
##  9 soil_moisture_28_to_100cm  surface_pressure            0.588   0.0277  -0.560
## 10 cloudcover                 surface_pressure            0.279  -0.138   -0.417
## 11 surface_pressure           windspeed_100m              0.247  -0.165   -0.412
## 12 relativehumidity_2m        soil_moisture_100_to_25…    0.374  -0.00330 -0.378
## 13 soil_moisture_0_to_7cm     soil_moisture_28_to_100…    0.842   0.491   -0.351
## 14 surface_pressure           windspeed_10m               0.195  -0.149   -0.344
## 15 relativehumidity_2m        soil_moisture_28_to_100…    0.401   0.0674  -0.333
## 16 vapor_pressure_deficit     windspeed_10m              -0.0519  0.100    0.152
## 17 relativehumidity_2m        soil_temperature_100_to…   -0.120   0.0325   0.153
## 18 et0_fao_evapotranspiration soil_moisture_100_to_25…   -0.138   0.0221   0.160
## 19 diffuse_radiation          vapor_pressure_deficit      0.283   0.449    0.166
## 20 soil_moisture_28_to_100cm  vapor_pressure_deficit     -0.404  -0.234    0.170
## 21 soil_moisture_100_to_255cm temperature_2m             -0.204  -0.0213   0.183
## 22 soil_moisture_100_to_255cm soil_temperature_0_to_7…   -0.221  -0.0296   0.192
## 23 soil_moisture_100_to_255cm soil_temperature_28_to_…   -0.308  -0.115    0.193
## 24 dewpoint_2m                vapor_pressure_deficit     -0.0678  0.136    0.203
## 25 soil_moisture_100_to_255cm soil_temperature_7_to_2…   -0.257  -0.0527   0.205
## 26 direct_normal_irradiance   surface_pressure           -0.118   0.0875   0.205
## 27 doy                        soil_temperature_100_to…    0.453   0.739    0.286
## 28 soil_moisture_100_to_255cm vapor_pressure_deficit     -0.337  -0.0176   0.319
## 29 surface_pressure           vapor_pressure_deficit     -0.493  -0.0665   0.427
## 30 pressure_msl               surface_pressure            0.401   0.983    0.582

Differences by variable are explored:

corVarsTrain %>% 
    mutate(v1=pmin(as.character(Var1), as.character(Var2)), 
           v2=pmax(as.character(Var1), as.character(Var2))
    ) %>% 
    select(v1, v2, value_all=value) %>% 
    full_join(corVarsTrainEach, by=c("v1", "v2")) %>% 
    mutate(delta=value-value_all) %>%
    filter(v1!=v2) %>%
    select(city, v1, v2, delta) %>%
    bind_rows(., ., .id="src") %>%
    mutate(vrbl=case_when(src=="1"~v1, src=="2"~v2, TRUE~NA)) %>%
    ggplot(aes(x=fct_reorder(vrbl, delta, .fun=function(x) diff(range(x))), y=delta)) + 
    geom_boxplot(fill="lightblue") + 
    coord_flip() + 
    geom_hline(yintercept=0, color="black", lty=2) +
    labs(title="Correlation delta (individual minus all-city aggregate)", 
         x=NULL, 
         y="Delta (individual minus all-city aggregate)", 
         subtitle="Calculated for each variable combination and city"
         )

Soil moisture, surface pressure, and dewpoint are among the variables with the highest changes in correlation when calculated n individual cities and the all-city aggregate

An example Simpson’s paradox is MSL pressure vs. surface pressure:

allCity %>% 
    count(src, surface_pressure, pressure_msl) %>% 
    ggplot(aes(x=surface_pressure, pressure_msl)) + 
    geom_smooth(aes(weight=n, color=src), method="lm") + 
    geom_smooth(method="lm", lty=2, aes(weight=n), color="black") + 
    labs(title="Relationship between MSL pressure and surface pressure", 
         subtitle="Dashed black line is overall relationship"
         ) + 
    scale_color_discrete(NULL)
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'

The process is converted to functional form:

tmpSmoothPlot <- function(df, x, y, xName=x, yName=y, printPlot=TRUE, returnPlot=!isTRUE(printPlot)) {
    
    # FUNCTION ARGUMENTS:
    # df: the data frame
    # x: x variable
    # y: y variable
    # xName: name to describe x variable
    # yName: name to describe y variable
    # printPlot: boolean, should plot be printed?
    # returnPlot: boolean, should plot object be returned?
    
    p1 <- df %>% 
        select(src, all_of(c(x, y))) %>%
        purrr::set_names(c("src", "x1", "y1")) %>%
        count(src, x1, y1) %>% 
        ggplot(aes(x=x1, y=y1)) + 
        geom_smooth(aes(weight=n, color=src), method="lm") + 
        geom_smooth(method="lm", lty=2, aes(weight=n), color="black") + 
        labs(title=paste0("Relationship between ", xName, " and ", yName), 
             subtitle="Dashed black line is overall relationship", 
             y=if(y!=yName) paste0(yName, "\n(", y, ")") else y,
             x=if(x!=xName) paste0(xName, "\n(", x, ")") else x
             ) + 
    scale_color_discrete(NULL)
    
    # Print plot if requested
    if(isTRUE(printPlot)) print(p1)
    
    # Return plot if requested
    if(isTRUE(returnPlot)) return(p1)
    
}

# Example function call
tmpSmoothPlot(allCity, 
              x="surface_pressure", 
              y="pressure_msl", 
              yName="MSL Pressure", 
              xName="Surface Pressure"
              )
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'

The function is tested on two additional sets of metrics:

gridExtra::grid.arrange(tmpSmoothPlot(allCity %>% 
                                          mutate(across(c("apparent_temperature", "dewpoint_2m"), 
                                                        .fns=function(x) round(x)
                                                        )
                                                 ), 
                                      x="dewpoint_2m", 
                                      y="apparent_temperature", 
                                      xName="Dew Point", 
                                      yName="Apparent Temperature", 
                                      printPlot=FALSE
                                      ), 
                        tmpSmoothPlot(allCity %>% 
                                          mutate(across(c("surface_pressure", "dewpoint_2m"), 
                                                        .fns=function(x) round(x)
                                                        )
                                                 ), 
                                      x="dewpoint_2m", 
                                      y="surface_pressure", 
                                      xName="Dew Point", 
                                      yName="Surface Pressure", 
                                      printPlot=FALSE
                                      ), 
                        nrow=1
                        )
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'

The function is updated to allow for rounding:

tmpSmoothPlot <- function(df, 
                          x, 
                          y, 
                          xRound=NULL, 
                          yRound=NULL,
                          xName=x, 
                          yName=y, 
                          printPlot=TRUE, 
                          returnPlot=!isTRUE(printPlot)
                          ) {
    
    # FUNCTION ARGUMENTS:
    # df: the data frame
    # x: x variable
    # y: y variable
    # {x,y}Round: rounding to apply to vector {x,y} using function autoRound()
    #             NULL means no rounding (default)
    #             -1L means make an estimate based on data (around 100 buckets created)
    #             a positive float or integer means round everything to the nearest multiple
    # xName: name to describe x variable
    # yName: name to describe y variable
    # printPlot: boolean, should plot be printed?
    # returnPlot: boolean, should plot object be returned?
    
    p1 <- df %>% 
        select(src, all_of(c(x, y))) %>%
        purrr::set_names(c("src", "x1", "y1")) %>%
        mutate(x1=autoRound(x1, rndTo=xRound), 
               y1=autoRound(y1, rndTo=yRound)
               ) %>%
        count(src, x1, y1) %>% 
        ggplot(aes(x=x1, y=y1)) + 
        geom_smooth(aes(weight=n, color=src), method="lm") + 
        geom_smooth(method="lm", lty=2, aes(weight=n), color="black") + 
        labs(title=paste0("Relationship between ", xName, " and ", yName), 
             subtitle="Dashed black line is overall relationship", 
             y=if(y!=yName) paste0(yName, "\n(", y, ")") else y,
             x=if(x!=xName) paste0(xName, "\n(", x, ")") else x
             ) + 
    scale_color_discrete(NULL)
    
    # Print plot if requested
    if(isTRUE(printPlot)) print(p1)
    
    # Return plot if requested
    if(isTRUE(returnPlot)) return(p1)
    
}

The updated function is tested on elements with many buckets:

# Example function call (no rounding)
t0 <- Sys.time()
system.time(
    tmpSmoothPlot(allCity, 
                  x="direct_normal_irradiance", 
                  y="shortwave_radiation", 
                  xName="Direct Solar Radiation", 
                  yName="Shortwave Solar Radiation"
                  )
    )
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'

##    user  system elapsed 
##    4.38    0.43    4.84
Sys.time() - t0
## Time difference of 5.019886 secs
# Example function call (rounding)
t0 <- Sys.time()
system.time(
    tmpSmoothPlot(allCity, 
                  x="direct_normal_irradiance", 
                  y="shortwave_radiation", 
                  xRound=-1L, 
                  yRound=-1L,
                  xName="Direct Solar Radiation", 
                  yName="Shortwave Solar Radiation"
                  )
    )
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'

##    user  system elapsed 
##    0.93    0.22    1.12
Sys.time() - t0
## Time difference of 1.360824 secs

A simple model is run to predict surface pressure as a function of dewpoint, without considering location:

# Create model
tstLM <- allCity %>%
    select(dewpoint_2m, surface_pressure) %>%
    lm(surface_pressure ~ dewpoint_2m, data=.)

# Summary
summary(tstLM)
## 
## Call:
## lm(formula = surface_pressure ~ dewpoint_2m, data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -82.042 -15.314   5.729  17.119  83.092 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 9.871e+02  3.631e-02 27182.2   <2e-16 ***
## dewpoint_2m 9.507e-01  2.632e-03   361.2   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 26.72 on 854206 degrees of freedom
## Multiple R-squared:  0.1325, Adjusted R-squared:  0.1325 
## F-statistic: 1.305e+05 on 1 and 854206 DF,  p-value: < 2.2e-16
# Review of predictions
allCity %>%
    select(src, dewpoint_2m, surface_pressure) %>%
    mutate(pred=predict(tstLM, newdata=.)) %>%
    mutate(across(c(surface_pressure, pred), .fns=function(x) autoRound(x))) %>%
    count(surface_pressure, pred) %>%
    ggplot(aes(x=pred, y=surface_pressure)) + 
    geom_point(aes(size=n), alpha=0.25) + 
    geom_smooth(aes(weight=n), method="lm") + 
    geom_abline(slope=1, intercept=0, lty=2, color="red") + 
    labs(x="Prediction", 
         y="Actual", 
         title="Predictions for Surface Pressure", 
         subtitle="Surface Pressure ~ Dewpoint"
         ) + 
    scale_size_continuous(NULL)
## `geom_smooth()` using formula = 'y ~ x'

The model is updated to predict surface pressure as a function of dewpoint, considering location:

# Create model
tstLM_002 <- allCity %>%
    select(dewpoint_2m, surface_pressure, src) %>%
    lm(surface_pressure ~ dewpoint_2m:src + src, data=.)

# Summary
summary(tstLM_002)
## 
## Call:
## lm(formula = surface_pressure ~ dewpoint_2m:src + src, data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -48.856  -2.901   0.233   3.277  28.618 
## 
## Coefficients:
##                          Estimate Std. Error  t value Pr(>|t|)    
## (Intercept)             1.017e+03  1.844e-02 55129.10   <2e-16 ***
## srcChicago             -2.021e+01  2.594e-02  -779.18   <2e-16 ***
## srcHouston              4.918e+00  4.032e-02   121.98   <2e-16 ***
## srcLA                  -3.904e+01  2.910e-02 -1341.57   <2e-16 ***
## srcMiami                5.985e+00  7.381e-02    81.09   <2e-16 ***
## srcNYC                 -2.917e+00  2.700e-02  -108.04   <2e-16 ***
## srcVegas               -8.141e+01  2.487e-02 -3273.38   <2e-16 ***
## dewpoint_2m:srcBoston  -1.907e-01  1.561e-03  -122.19   <2e-16 ***
## dewpoint_2m:srcChicago -2.323e-01  1.493e-03  -155.62   <2e-16 ***
## dewpoint_2m:srcHouston -4.411e-01  2.017e-03  -218.66   <2e-16 ***
## dewpoint_2m:srcLA      -2.287e-01  2.261e-03  -101.17   <2e-16 ***
## dewpoint_2m:srcMiami   -3.044e-01  3.512e-03   -86.66   <2e-16 ***
## dewpoint_2m:srcNYC     -1.990e-01  1.607e-03  -123.81   <2e-16 ***
## dewpoint_2m:srcVegas   -1.471e-01  2.120e-03   -69.39   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.725 on 854194 degrees of freedom
## Multiple R-squared:  0.9602, Adjusted R-squared:  0.9602 
## F-statistic: 1.585e+06 on 13 and 854194 DF,  p-value: < 2.2e-16
# Review of predictions
allCity %>%
    select(src, dewpoint_2m, surface_pressure) %>%
    mutate(pred=predict(tstLM_002, newdata=.)) %>%
    mutate(across(c(surface_pressure, pred), .fns=function(x) autoRound(x))) %>%
    count(src, surface_pressure, pred) %>%
    ggplot(aes(x=pred, y=surface_pressure)) + 
    geom_point(aes(size=n, color=src), alpha=0.25) + 
    geom_smooth(aes(weight=n), method="lm") + 
    geom_abline(slope=1, intercept=0, lty=2, color="red") + 
    labs(x="Prediction", 
         y="Actual", 
         title="Predictions for Surface Pressure", 
         subtitle="Surface Pressure ~ Dewpoint:City + City"
         ) + 
    scale_size_continuous(NULL) + 
    scale_color_discrete(NULL)
## `geom_smooth()` using formula = 'y ~ x'

Long term daily data is downloaded for Atlanta:

# Daily data download for Atlanta, GA
testURLDaily <- helperOpenMeteoURL(cityName="Atlanta GA", 
                                   dailyIndices=1:nrow(tblMetricsDaily),
                                   startDate="1960-01-01", 
                                   endDate="2023-12-31", 
                                   tz="US/Eastern"
                                   )
## 
## Daily metrics created from indices: weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration
testURLDaily
## [1] "https://archive-api.open-meteo.com/v1/archive?latitude=33.76&longitude=-84.42&start_date=1960-01-01&end_date=2023-12-31&daily=weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration&timezone=US%2FEastern"
# Download file
if(!file.exists("testOM_daily_atl.json")) {
    fileDownload(fileName="testOM_daily_atl.json", url=testURLDaily)
} else {
    cat("\nFile testOM_daily_atl.json already exists, skipping download\n")
}
## 
## File testOM_daily_atl.json already exists, skipping download

The daily dataset is loaded:

# Read daily JSON file
atlOMDaily <- formatOpenMeteoJSON("testOM_daily_atl.json")
## 
## Objects in JSON include: latitude, longitude, generationtime_ms, utc_offset_seconds, timezone, timezone_abbreviation, elevation, daily_units, daily 
## 
## $tblDaily
## # A tibble: 23,376 × 18
##    date       time       weathercode temperature_2m_max temperature_2m_min
##    <date>     <chr>            <int>              <dbl>              <dbl>
##  1 1960-01-01 1960-01-01          71                2.6                0.1
##  2 1960-01-02 1960-01-02          61               11.4                1.5
##  3 1960-01-03 1960-01-03          63               14.7                2  
##  4 1960-01-04 1960-01-04          53                7.3               -0.1
##  5 1960-01-05 1960-01-05          51                7.1                0.2
##  6 1960-01-06 1960-01-06          53                7.3                4.5
##  7 1960-01-07 1960-01-07          61                7.4                3.1
##  8 1960-01-08 1960-01-08           2               10.7                0.4
##  9 1960-01-09 1960-01-09           3               15                 -0.8
## 10 1960-01-10 1960-01-10           3               17.5                4  
## # ℹ 23,366 more rows
## # ℹ 13 more variables: apparent_temperature_max <dbl>,
## #   apparent_temperature_min <dbl>, precipitation_sum <dbl>, rain_sum <dbl>,
## #   snowfall_sum <dbl>, precipitation_hours <dbl>, sunrise <chr>, sunset <chr>,
## #   windspeed_10m_max <dbl>, windgusts_10m_max <dbl>,
## #   winddirection_10m_dominant <int>, shortwave_radiation_sum <dbl>,
## #   et0_fao_evapotranspiration <dbl>
## 
## $tblHourly
## NULL
## 
## $tblUnits
## # A tibble: 17 × 4
##    metricType  name                       value      description                
##    <chr>       <chr>                      <chr>      <chr>                      
##  1 daily_units time                       "iso8601"  <NA>                       
##  2 daily_units weathercode                "wmo code" The most severe weather co…
##  3 daily_units temperature_2m_max         "deg C"    Maximum and minimum daily …
##  4 daily_units temperature_2m_min         "deg C"    Maximum and minimum daily …
##  5 daily_units apparent_temperature_max   "deg C"    Maximum and minimum daily …
##  6 daily_units apparent_temperature_min   "deg C"    Maximum and minimum daily …
##  7 daily_units precipitation_sum          "mm"       Sum of daily precipitation…
##  8 daily_units rain_sum                   "mm"       Sum of daily rain          
##  9 daily_units snowfall_sum               "cm"       Sum of daily snowfall      
## 10 daily_units precipitation_hours        "h"        The number of hours with r…
## 11 daily_units sunrise                    "iso8601"  Sun rise and set times     
## 12 daily_units sunset                     "iso8601"  Sun rise and set times     
## 13 daily_units windspeed_10m_max          "km/h"     Maximum wind speed and gus…
## 14 daily_units windgusts_10m_max          "km/h"     Maximum wind speed and gus…
## 15 daily_units winddirection_10m_dominant "deg "     Dominant wind direction    
## 16 daily_units shortwave_radiation_sum    "MJ/m²"    The sum of solar radiaion …
## 17 daily_units et0_fao_evapotranspiration "mm"       Daily sum of ET0 Reference…
## 
## $tblDescription
## # A tibble: 1 × 7
##   latitude longitude generationtime_ms utc_offset_seconds timezone  
##      <dbl>     <dbl>             <dbl>              <int> <chr>     
## 1     33.8     -84.4              388.             -18000 US/Eastern
## # ℹ 2 more variables: timezone_abbreviation <chr>, elevation <dbl>
## 
## 
## latitude: 33.77856
## longitude: -84.40299
## generationtime_ms: 388.082
## utc_offset_seconds: -18000
## timezone: US/Eastern
## timezone_abbreviation: EST
## elevation: 302
# Sample records of tibble
atlOMDaily$tblDaily %>% glimpse()
## Rows: 23,376
## Columns: 18
## $ date                       <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time                       <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode                <int> 71, 61, 63, 53, 51, 53, 61, 2, 3, 3, 3, 3, …
## $ temperature_2m_max         <dbl> 2.6, 11.4, 14.7, 7.3, 7.1, 7.3, 7.4, 10.7, …
## $ temperature_2m_min         <dbl> 0.1, 1.5, 2.0, -0.1, 0.2, 4.5, 3.1, 0.4, -0…
## $ apparent_temperature_max   <dbl> -2.3, 9.2, 13.1, 2.8, 4.4, 6.1, 4.9, 6.9, 1…
## $ apparent_temperature_min   <dbl> -4.0, -3.1, -2.7, -4.7, -3.3, 1.4, -0.5, -4…
## $ precipitation_sum          <dbl> 3.0, 7.8, 7.3, 1.1, 0.6, 5.8, 8.7, 0.0, 0.0…
## $ rain_sum                   <dbl> 2.5, 7.8, 7.3, 1.1, 0.6, 5.8, 8.7, 0.0, 0.0…
## $ snowfall_sum               <dbl> 0.35, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
## $ precipitation_hours        <dbl> 13, 21, 5, 3, 3, 17, 8, 0, 0, 0, 0, 0, 0, 5…
## $ sunrise                    <chr> "1960-01-01T07:42", "1960-01-02T07:42", "19…
## $ sunset                     <chr> "1960-01-01T17:39", "1960-01-02T17:40", "19…
## $ windspeed_10m_max          <dbl> 20.9, 18.7, 23.8, 15.5, 9.2, 11.4, 19.2, 17…
## $ windgusts_10m_max          <dbl> 39.6, 41.4, 56.2, 44.6, 33.1, 41.8, 39.2, 3…
## $ winddirection_10m_dominant <int> 90, 106, 295, 318, 2, 69, 288, 313, 140, 24…
## $ shortwave_radiation_sum    <dbl> 2.38, 1.83, 10.18, 9.37, 3.88, 1.77, 5.34, …
## $ et0_fao_evapotranspiration <dbl> 0.57, 0.38, 1.34, 1.36, 0.65, 0.33, 0.76, 1…

Variables are converted to proper data type:

dfDailyATL <- atlOMDaily$tblDaily %>%
    mutate(weathercode=factor(weathercode), 
           sunrise_chr=sunrise, 
           sunset_chr=sunset,
           sunrise=lubridate::ymd_hm(sunrise), 
           sunset=lubridate::ymd_hm(sunset), 
           fct_winddir=factor(winddirection_10m_dominant)
           )
glimpse(dfDailyATL)
## Rows: 23,376
## Columns: 21
## $ date                       <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time                       <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode                <fct> 71, 61, 63, 53, 51, 53, 61, 2, 3, 3, 3, 3, …
## $ temperature_2m_max         <dbl> 2.6, 11.4, 14.7, 7.3, 7.1, 7.3, 7.4, 10.7, …
## $ temperature_2m_min         <dbl> 0.1, 1.5, 2.0, -0.1, 0.2, 4.5, 3.1, 0.4, -0…
## $ apparent_temperature_max   <dbl> -2.3, 9.2, 13.1, 2.8, 4.4, 6.1, 4.9, 6.9, 1…
## $ apparent_temperature_min   <dbl> -4.0, -3.1, -2.7, -4.7, -3.3, 1.4, -0.5, -4…
## $ precipitation_sum          <dbl> 3.0, 7.8, 7.3, 1.1, 0.6, 5.8, 8.7, 0.0, 0.0…
## $ rain_sum                   <dbl> 2.5, 7.8, 7.3, 1.1, 0.6, 5.8, 8.7, 0.0, 0.0…
## $ snowfall_sum               <dbl> 0.35, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
## $ precipitation_hours        <dbl> 13, 21, 5, 3, 3, 17, 8, 0, 0, 0, 0, 0, 0, 5…
## $ sunrise                    <dttm> 1960-01-01 07:42:00, 1960-01-02 07:42:00, …
## $ sunset                     <dttm> 1960-01-01 17:39:00, 1960-01-02 17:40:00, …
## $ windspeed_10m_max          <dbl> 20.9, 18.7, 23.8, 15.5, 9.2, 11.4, 19.2, 17…
## $ windgusts_10m_max          <dbl> 39.6, 41.4, 56.2, 44.6, 33.1, 41.8, 39.2, 3…
## $ winddirection_10m_dominant <int> 90, 106, 295, 318, 2, 69, 288, 313, 140, 24…
## $ shortwave_radiation_sum    <dbl> 2.38, 1.83, 10.18, 9.37, 3.88, 1.77, 5.34, …
## $ et0_fao_evapotranspiration <dbl> 0.57, 0.38, 1.34, 1.36, 0.65, 0.33, 0.76, 1…
## $ sunrise_chr                <chr> "1960-01-01T07:42", "1960-01-02T07:42", "19…
## $ sunset_chr                 <chr> "1960-01-01T17:39", "1960-01-02T17:40", "19…
## $ fct_winddir                <fct> 90, 106, 295, 318, 2, 69, 288, 313, 140, 24…

Averages by month for select continuous variables are plotted:

tmpMapNames <- c("precipitation_sum"="3. Precipitation (mm)", 
                 "windspeed_10m_max"="4. Windspeed (kph)", 
                 "temperature_2m_max"="1. High Temperature (C)",
                 "temperature_2m_min"="2. Low Temperature (C)"
                 )

dfDailyATL %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(year, month, temperature_2m_max, temperature_2m_min, precipitation_sum, windspeed_10m_max) %>%
    group_by(year, month) %>%
    summarize(across(-c(precipitation_sum), .fns=mean), 
              across(c(precipitation_sum), .fns=sum), 
              .groups="drop"
              ) %>%
    group_by(month) %>%
    summarize(across(-c(year), .fns=mean)) %>%
    pivot_longer(-c(month)) %>%
    ggplot(aes(x=month, y=value)) + 
    geom_line(aes(group=1)) + 
    facet_wrap(~tmpMapNames[name], scales="free_y") + 
    labs(x=NULL, y=NULL, title="Monthly averages for key metrics (Atlanta 1960-2023)")

Averages by month for select categorical variables are plotted:

tmpMapNames <- c("wc"="1. Weather Type", 
                 "winddir"="2. Predominant Wind Direction"
                 )

tmpDFPlot <- dfDailyATL %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), 
           year=year(date), 
           winddir=case_when(winddirection_10m_dominant>360~"Invalid", 
                             winddirection_10m_dominant>=315~"1. N", 
                             winddirection_10m_dominant>=225~"2. W", 
                             winddirection_10m_dominant>=135~"3. S", 
                             winddirection_10m_dominant>=45~"4. E", 
                             winddirection_10m_dominant>=0~"1. N", 
                             TRUE~"Invalid"
                             ), 
           wc=case_when(weathercode==0~"1. Clear", 
                        weathercode %in% c(1, 2, 3)~"2. Dry", 
                        weathercode %in% c(51, 53, 55)~"3. Drizzle", 
                        weathercode %in% c(61, 63, 65)~"4. Rain", 
                        weathercode %in% c(71, 73, 75)~"5. Snow", 
                        TRUE~"Error"
                        )
           ) %>%
    select(month, wc, winddir) %>%
    pivot_longer(-c(month)) %>%
    count(month, name, value)

tmpPlotFN <- function(x) {
    p1 <- tmpDFPlot %>%
        filter(name==x) %>%
        ggplot(aes(x=month, y=n)) + 
        geom_line(aes(group=value, color=value), lwd=2) + 
        labs(x=NULL, 
             y=NULL, 
             title=paste0(tmpMapNames[x], " (Atlanta 1960-2023)")
             ) + 
        scale_color_discrete(NULL) + 
        theme(legend.position = "bottom")
    return(p1)
    }

gridExtra::grid.arrange(tmpPlotFN("wc"), tmpPlotFN("winddir"), nrow=1)

Averages by year for select variables are plotted:

tmpMapNames <- c("precipitation_sum"="3. Precipitation (mm)", 
                 "windspeed_10m_max"="4. Windspeed (kph)", 
                 "temperature_2m_max"="1. High Temperature (C)",
                 "temperature_2m_min"="2. Low Temperature (C)"
                 )

dfDailyATL %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(year, temperature_2m_max, temperature_2m_min, precipitation_sum, windspeed_10m_max) %>%
    group_by(year) %>%
    summarize(across(-c(precipitation_sum), .fns=mean), 
              across(c(precipitation_sum), .fns=sum), 
              .groups="drop"
              ) %>%
    pivot_longer(-c(year)) %>%
    ggplot(aes(x=year, y=value)) + 
    geom_line(aes(group=1)) + 
    facet_wrap(~tmpMapNames[name], scales="free_y") + 
    labs(x=NULL, y=NULL, title="Annual averages for key metrics (Atlanta 1960-2023)")

Averages by year for select categorical variables are plotted:

tmpMapNames <- c("wc"="1. Weather Type", 
                 "winddir"="2. Predominant Wind Direction"
                 )

tmpDFPlot <- dfDailyATL %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), 
           year=year(date), 
           winddir=case_when(winddirection_10m_dominant>360~"Invalid", 
                             winddirection_10m_dominant>=315~"1. N", 
                             winddirection_10m_dominant>=225~"2. W", 
                             winddirection_10m_dominant>=135~"3. S", 
                             winddirection_10m_dominant>=45~"4. E", 
                             winddirection_10m_dominant>=0~"1. N", 
                             TRUE~"Invalid"
                             ), 
           wc=case_when(weathercode==0~"1. Clear", 
                        weathercode %in% c(1, 2, 3)~"2. Dry", 
                        weathercode %in% c(51, 53, 55)~"3. Drizzle", 
                        weathercode %in% c(61, 63, 65)~"4. Rain", 
                        weathercode %in% c(71, 73, 75)~"5. Snow", 
                        TRUE~"Error"
                        )
           ) %>%
    select(year, wc, winddir) %>%
    pivot_longer(-c(year)) %>%
    count(year, name, value)

tmpPlotFN <- function(x) {
    p1 <- tmpDFPlot %>%
        filter(name==x) %>%
        ggplot(aes(x=year, y=n)) + 
        geom_line(aes(group=value, color=value), lwd=1) + 
        labs(x=NULL, 
             y=NULL, 
             title=paste0(tmpMapNames[x], " (Atlanta 1960-2023)")
             ) + 
        scale_color_discrete(NULL) + 
        theme(legend.position = "bottom")
    return(p1)
    }

gridExtra::grid.arrange(tmpPlotFN("wc"), tmpPlotFN("winddir"), nrow=1)

Boxplots for maximum windspeed are created:

dfDailyATL %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(month, windspeed_10m_max) %>%
    ggplot(aes(x=month, y=windspeed_10m_max)) + 
    geom_boxplot(fill="lightblue") + 
    lims(y=c(0, NA)) +
    labs(x=NULL, 
         y="Maximum daily wind speed (kph)", 
         title="Maximum daily windspeed by month (Atlanta 1960-2023)"
         )

Boxplots for precipitation are created:

dfDailyATL %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(month, precipitation_hours) %>%
    ggplot(aes(x=month, y=precipitation_hours)) + 
    geom_boxplot(fill="lightblue") + 
    lims(y=c(0, NA)) +
    labs(x=NULL, 
         y="Daily precipitation hours", 
         title="Daily precipitation hours by month (Atlanta 1960-2023)"
         )

dfDailyATL %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(month, precipitation_sum) %>%
    ggplot(aes(x=month, y=precipitation_sum)) + 
    geom_boxplot(fill="lightblue") + 
    lims(y=c(0, NA)) +
    labs(x=NULL, 
         y="Daily precipitation (mm)", 
         title="Daily precipitation by month (Atlanta 1960-2023)"
         )

Boxplots for temperature are created:

dfDailyATL %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(month, temperature_2m_max) %>%
    ggplot(aes(x=month, y=temperature_2m_max)) + 
    geom_boxplot(fill="lightblue") + 
    labs(x=NULL, 
         y="Daily high temperature (C)", 
         title="Daily high temperature by month (Atlanta 1960-2023)"
         )

dfDailyATL %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(month, temperature_2m_min) %>%
    ggplot(aes(x=month, y=temperature_2m_min)) + 
    geom_boxplot(fill="lightblue") + 
    labs(x=NULL, 
         y="Daily low temperature (C)", 
         title="Daily low temperature by month (Atlanta 1960-2023)"
         )

ACF and PACF are explored for maximum temperature:

acfTemp <- acf(dfDailyATL$temperature_2m_max, lag.max=1000)

as.vector(acfTemp$acf) %>% findPeaks(width=21) %>% which()
## [1] 365 737
as.vector(acfTemp$acf) %>% findPeaks(width=21, FUN=min) %>% which()
## [1] 185 549 916
pacfTemp <- pacf(dfDailyATL$temperature_2m_max, lag.max=1000)

pacf(dfDailyATL$temperature_2m_max, lag.max=50)

As expected, ACF has a sustained seasonal pattern, with peaks (at roughly intervals of 365) and valleys (at roughly intervals of 365 offset by roughly 185) corresponding to the 365-day year

ACF and PACF are explored for precipitation:

acfPrecip <- acf(dfDailyATL$precipitation_sum, lag.max=1000)

acf(dfDailyATL$precipitation_sum, lag.max=50)

as.vector(acfPrecip$acf) %>% findPeaks(width=21) %>% which()
##  [1]  28  52  78 100 126 140 157 187 202 213 250 284 311 335 361 377 406 428 442
## [20] 465 481 502 523 540 553 585 609 636 650 680 692 710 728 744 757 769 791 803
## [39] 825 852 870 897 913 930 941 961 989
as.vector(acfPrecip$acf) %>% findPeaks(width=21, FUN=min) %>% which()
##  [1]  18  56  71  97 120 134 166 178 195 210 226 238 268 288 302 320 332 344 359
## [20] 383 409 420 438 452 471 483 497 533 551 565 581 598 613 633 654 676 695 719
## [39] 743 766 799 812 842 862 902 920 948 977
pacfPrecip <- pacf(dfDailyATL$precipitation_sum, lag.max=1000)

pacf(dfDailyATL$precipitation_sum, lag.max=50)

Precipitation by contrast has little seasonality and mostly just a correlation to the previous day’s value

ACF and PACF are explored for windspeed:

acfWind <- acf(dfDailyATL$windspeed_10m_max, lag.max=1000)

acf(dfDailyATL$windspeed_10m_max, lag.max=50)

as.vector(acfWind$acf) %>% findPeaks(width=21) %>% which()
##  [1]  13 172 183 338 361 378 553 738 894 921
as.vector(acfWind$acf) %>% findPeaks(width=21, FUN=min) %>% which()
## [1] 174 185 555 730 891 910 944
pacfWind <- pacf(dfDailyATL$windspeed_10m_max, lag.max=1000)

pacf(dfDailyATL$windspeed_10m_max, lag.max=50)

Similar to temperature, wind speed appears to have a sustained seasonal component, though peak correlations are much lower in magnitude (~0.2 for wind speed vs. ~0.75 for temperature)

A boxplot for delta daily temperature is created:

dfDailyATL %>%
    mutate(month=factor(month.abb[month(date)], levels=month.abb), 
           chg=temperature_2m_max-lag(temperature_2m_max)
           ) %>%
    filter(!is.na(chg)) %>%
    ggplot(aes(x=month, y=chg)) + 
    geom_boxplot() + 
    labs(x=NULL, y="Daily change in maximum temperature")

Daily temperature changes are generally larger in magnitude during the colder season

A boxplot for delta daily wind speed is created:

dfDailyATL %>%
    mutate(month=factor(month.abb[month(date)], levels=month.abb), 
           chg=windspeed_10m_max-lag(windspeed_10m_max)
           ) %>%
    filter(!is.na(chg)) %>%
    ggplot(aes(x=month, y=chg)) + 
    geom_boxplot() + 
    labs(x=NULL, y="Daily change in maximum wind speed")

Daily wind speed changes are generally larger in magnitude during the colder season

A boxplot for delta daily precipitation is created:

dfDailyATL %>%
    mutate(month=factor(month.abb[month(date)], levels=month.abb), 
           chg=precipitation_sum-lag(precipitation_sum)
           ) %>%
    filter(!is.na(chg)) %>%
    ggplot(aes(x=month, y=chg)) + 
    geom_boxplot() + 
    labs(x=NULL, y="Daily change in precipitation")

Most days have no precipitation. The corresponding boxplot for precipitation change has small boxes and whiskers with many outliers

Averages for temperature, precipitation, and wind speed are calculated by day of year, using a 21-day rolling window:

df_r21 <- dfDailyATL %>%
    mutate(doy=pmin(yday(date), 365)) %>% 
    helperRollingAgg(origVar="temperature_2m_max", newName="temp_r21", k=21) %>% 
    helperRollingAgg(origVar="windspeed_10m_max", newName="wind_r21", k=21) %>% 
    helperRollingAgg(origVar="precipitation_sum", newName="precip_r21", k=21) %>%
    select(date, doy, contains("_r21"))

df_r21 %>%
    na.omit() %>%
    group_by(doy) %>%
    summarize(across(where(is.numeric), .fns=mean), n=n()) %>%
    pivot_longer(cols=-c(doy)) %>%
    ggplot(aes(x=doy, y=value)) + 
    geom_line(aes(group=name, color=name)) + 
    facet_wrap(~name, scales="free_y") +
    labs(x="Day of Year", y="Rolling-21 mean") + 
    theme(legend.position="none")

Outliers are likely much more important in driving average precipitation than in driving average temperature

Standard deviations for temperature, precipitation, and wind speed are calculated by day of year, using a 21-day rolling window:

df_r21_sd <- dfDailyATL %>%
    mutate(doy=pmin(yday(date), 365)) %>% 
    group_by(doy) %>%
    mutate(across(.cols=c(temperature_2m_max, windspeed_10m_max, precipitation_sum), .fns=sd)) %>%
    ungroup() %>%
    helperRollingAgg(origVar="temperature_2m_max", newName="temp_r21", k=21) %>% 
    helperRollingAgg(origVar="windspeed_10m_max", newName="wind_r21", k=21) %>% 
    helperRollingAgg(origVar="precipitation_sum", newName="precip_r21", k=21) %>%
    select(date, doy, contains("_r21"))

df_r21_sd %>%
    na.omit() %>%
    group_by(doy) %>%
    summarize(across(where(is.numeric), .fns=mean)) %>%
    pivot_longer(cols=-c(doy)) %>%
    ggplot(aes(x=doy, y=value)) + 
    geom_line(aes(group=name, color=name)) + 
    facet_wrap(~name, scales="free_y") +
    labs(x="Day of Year", y="Rolling-21 mean of daily standard deviation") + 
    theme(legend.position="none")

Means and standard deviations are plotted together:

df_r21 %>%
    bind_rows(df_r21_sd, .id="src") %>%
    na.omit() %>%
    mutate(musig=c("1"="Mean", "2"="SD")[src]) %>%
    group_by(doy, musig) %>%
    summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
    pivot_longer(cols=-c(doy, musig)) %>%
    pivot_wider(id_cols=c(doy, name), names_from="musig") %>%
    ggplot(aes(x=doy)) + 
    geom_line(aes(y=Mean, group=name, color=name), lwd=2) + 
    geom_ribbon(aes(ymin=Mean-SD, ymax=Mean+SD, fill=name), alpha=0.5) +
    facet_wrap(~name, scales="free_y") +
    labs(x="Day of Year", 
         y="Rolling-21 mean +/- daily standard deviation", 
         title="Rolling 21-day mean +/- one rolling 21-day sd"
         ) + 
    theme(legend.position="none")

Means and approximate SEM are plotted together:

nYears <- length(unique(year(df_r21$date)))
nYears
## [1] 64
df_r21 %>%
    bind_rows(df_r21_sd, .id="src") %>%
    na.omit() %>%
    mutate(musig=c("1"="Mean", "2"="SD")[src]) %>%
    group_by(doy, musig) %>%
    summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
    pivot_longer(cols=-c(doy, musig)) %>%
    pivot_wider(id_cols=c(doy, name), names_from="musig") %>%
    ggplot(aes(x=doy)) + 
    geom_line(aes(y=Mean, group=name, color=name), lwd=2) + 
    geom_ribbon(aes(ymin=Mean-SD/sqrt(nYears-1), ymax=Mean+SD/sqrt(nYears-1), fill=name), alpha=0.5) +
    facet_wrap(~name, scales="free_y") +
    labs(x="Day of Year", 
         y="Rolling-21 mean +/- 1 SEM (approx)", 
         title="Rolling 21-day mean +/- 1 SEM (approx)"
         ) + 
    theme(legend.position="none")

Consistent with previous analyses, temperature has a strong seasonal pattern with differences in mean that vastly exceed the standard error of the mean. By contrast, while there is apparent seasonal spikiness to precipitation, rolling 21-day means are usually within one standard error of the mean of each other. Wind is more similar to temperature in having seasonal variations that meaningfully exceed SEM in magnitude.

Percentage of cumulative precipitation by volume of daily precipitation is explored:

tmpPlotData <- dfDailyATL %>%
    select(date, precipitation_sum) %>%
    group_by(precipitation_sum) %>%
    summarize(n=n(), precip=sum(precipitation_sum), .groups="drop") %>%
    mutate(csp=cumsum(precip), cpp=csp/sum(precip), csn=cumsum(n), cpn=csn/sum(n)) 

tmpPlotData %>%
    pivot_longer(cols=-c(precipitation_sum)) %>%
    ggplot(aes(x=precipitation_sum, y=value)) + 
    geom_line(data=~filter(., name %in% c("cpp", "cpn")), 
              aes(group=name, color=c("cpn"="# events", "cpp"="precip")[name])
              ) + 
    labs(x="Daily precipitation (mm)", y="% total (cumul)") + 
    scale_color_discrete("Metric")

Around half of days have no precipitation. Around 25% of total precipitation comes from the rare day with 25+ mm (about an inch) of precipitation

Total precipitation by decade is also explored, sorted by daily precipitation:

dfDailyATL %>%
    select(date, precipitation_sum) %>%
    mutate(year=year(date), decade=round(year(date)-4.5, -1)) %>%
    group_by(decade, precipitation_sum) %>%
    summarize(n=n(), precip=sum(precipitation_sum), .groups="drop") %>%
    group_by(decade) %>%
    mutate(csp=cumsum(precip), ntot=sum(n), meancsp=365*csp/ntot) %>%
    ungroup() %>%
    ggplot(aes(x=precipitation_sum, y=meancsp)) + 
    geom_line(aes(group=factor(decade), color=factor(decade)), lwd=1) + 
    labs(x="Daily precipitation (mm)", 
         y="Cumulative annual precipitation (mm)", 
         title="Average annual precipitation by decade", 
         subtitle="Cumulative, for daily precipitation amounts LTE x-axis"
         ) + 
    scale_color_discrete("Decade")

Trends look similar by decade for precipitation amounts under 0.5 inch (12.5 mm). Heavier precipitation amounts appear to be more prevalent in the 202s, driving overall precipitation totals ~50% higher

Precipitation appears to have broken trend starting in the late 2010s:

dfDailyATL %>%
    select(date, precipitation_sum) %>%
    mutate(year=year(date), decade=round(year(date)-4.5, -1)) %>% 
    group_by(decade, year) %>% 
    summarize(precip=sum(precipitation_sum), .groups="drop") %>% 
    ggplot(aes(x=year, y=precip)) + 
    geom_line(aes(color=ifelse(year>=2017, "2017-2023", "pre-2017")), lwd=1) + 
    geom_smooth(aes(color=ifelse(year>=2017, "2017-2023", "pre-2017")), method="lm", lty=2) + 
    geom_point() + 
    labs(x=NULL, y="Annual precipitation (mm)") + 
    scale_color_discrete(NULL) + 
    lims(y=c(0, NA))
## `geom_smooth()` using formula = 'y ~ x'

Precipitation by month pre-2017 is explored:

yMax <- dfDailyATL %>%
    group_by(year=year(date), month=month(date)) %>%
    summarize(precip=sum(precipitation_sum), .groups="drop") %>%
    mutate(precip=ceiling(precip/50)*50) %>%
    pull() %>%
    max()
yMax
## [1] 350
p1 <- dfDailyATL %>%
    select(date, precipitation_sum) %>%
    mutate(year=year(date), 
           decade=round(year(date)-4.5, -1), 
           month=factor(month.abb[month(date)], levels=month.abb), 
           tp=ifelse(year<2017, "pre", year)
           ) %>% 
    group_by(tp, year, month) %>% 
    summarize(precip=sum(precipitation_sum), .groups="drop") %>% 
    group_by(tp, month) %>% 
    summarize(mu=mean(precip), sd=sd(precip), .groups="drop") %>% 
    ggplot(aes(x=month)) + 
    geom_col(data=~filter(., tp=="pre"), aes(y=mu), fill="lightblue") + 
    geom_errorbar(data=~filter(., tp=="pre"), aes(ymin=mu-sd, ymax=mu+sd), width=0.2) +
    geom_hline(data=~filter(., tp=="pre"), color="red", lty=2, aes(yintercept=mean(mu))) + 
    labs(x=NULL, y="Monthly precipitation (mm)", title="Monthly precipitation +/- 1 SD\n(pre-2017)") + 
    lims(y=c(0, yMax))
p1

Precipitation by month post-2017 is explored:

p2 <- dfDailyATL %>%
    select(date, precipitation_sum) %>%
    mutate(year=year(date), 
           decade=round(year(date)-4.5, -1), 
           month=factor(month.abb[month(date)], levels=month.abb), 
           tp=ifelse(year<2017, "pre", year)
           ) %>% 
    group_by(tp, year, month) %>% 
    summarize(precip=sum(precipitation_sum), .groups="drop") %>% 
    filter(tp!="pre") %>%
    group_by(month) %>% 
    summarize(mu=mean(precip), mx=max(precip), mn=min(precip), .groups="drop") %>% 
    ggplot(aes(x=month)) + 
    geom_col(aes(y=mu), fill="lightblue") + 
    geom_errorbar(aes(ymin=mn, ymax=mx), width=0.2) +
    geom_hline(color="red", lty=2, aes(yintercept=mean(mu))) + 
    labs(x=NULL, y="Monthly precipitation (mm)", title="Monthly precipitation\nmax-mean-min (post-2017)") + 
    lims(y=c(0, yMax))
p2

The data are plotted on a single plot:

p3 <- dfDailyATL %>%
    select(date, precipitation_sum) %>%
    mutate(year=year(date), 
           decade=round(year(date)-4.5, -1), 
           month=factor(month.abb[month(date)], levels=month.abb), 
           tp=ifelse(year<2017, "pre", "post")
    ) %>% 
    group_by(tp, year, month) %>% 
    summarize(precip=sum(precipitation_sum), .groups="drop") %>% 
    group_by(tp, month) %>% 
    summarize(mu=mean(precip), sd=sd(precip), mx=max(precip), mn=min(precip), .groups="drop") %>% 
    ggplot(aes(x=month)) + 
    geom_tile(data=~filter(., tp=="pre"), aes(y=mu, height=2*sd), width=0.75, fill="lightblue") + 
    geom_hline(data=~filter(., tp=="pre"), color="red", lty=2, aes(yintercept=mean(mu))) + 
    geom_errorbar(data=~filter(., tp=="post"), aes(ymin=mn, ymax=mx), width=0) +
    geom_point(data=~filter(., tp=="post"), aes(y=mu)) +
    annotate("text", x=1, y=25, label="bars are pre-2017\nmean +/- 1 SD", size=2.5, hjust=0) + 
    annotate("text", x=1, y=300, label="lines/points are post-2017\nmax-mean-min", size=2.5, hjust=0) + 
    labs(x=NULL, y="Monthly precipitation (mm)", title="Monthly precipitation\n(pre/post-2017)") + 
    lims(y=c(0, yMax))
p3

The average number of days with precipitation by month is explored:

yMax <- dfDailyATL %>%
    group_by(year=year(date), month=month(date)) %>%
    summarize(precip=sum(precipitation_sum>0), .groups="drop") %>%
    mutate(precip=ceiling(precip/5)*5) %>%
    pull() %>%
    max()
yMax
## [1] 35
p4 <- dfDailyATL %>%
    select(date, precipitation_sum) %>%
    mutate(year=year(date), 
           decade=round(year(date)-4.5, -1), 
           month=factor(month.abb[month(date)], levels=month.abb), 
           tp=ifelse(year<2017, "pre", "post")
    ) %>% 
    group_by(tp, year, month) %>% 
    summarize(precip=sum(precipitation_sum>0), .groups="drop") %>% 
    group_by(tp, month) %>% 
    summarize(mu=mean(precip), sd=sd(precip), mx=max(precip), mn=min(precip), .groups="drop") %>% 
    ggplot(aes(x=month)) + 
    geom_tile(data=~filter(., tp=="pre"), aes(y=mu, height=2*sd), width=0.75, fill="lightblue") + 
    geom_hline(data=~filter(., tp=="pre"), color="red", lty=2, aes(yintercept=mean(mu))) + 
    geom_errorbar(data=~filter(., tp=="post"), aes(ymin=mn, ymax=mx), width=0) +
    geom_point(data=~filter(., tp=="post"), aes(y=mu)) +
    annotate("text", x=1, y=5, label="bars are pre-2017\nmean +/- 1 SD", size=2.5, hjust=0) + 
    annotate("text", x=1, y=30, label="lines/points are post-2017\nmax-mean-min", size=2.5, hjust=0) + 
    labs(x=NULL, 
         y="Monthly precipitation (days greater than zero)", 
         title="Monthly days with precipitation\n(pre/post-2017)") + 
    lims(y=c(0, yMax))
p4

Median precipitation on days with precipitation > 0, by month, is explored:

p5 <- dfDailyATL %>%
    select(date, precipitation_sum) %>%
    mutate(year=year(date), 
           decade=round(year(date)-4.5, -1), 
           month=factor(month.abb[month(date)], levels=month.abb), 
           tp=ifelse(year<2017, "pre", "post")
    ) %>% 
    group_by(tp, month) %>% 
    summarize(nprecip=sum(precipitation_sum>0), 
              tprecip=sum(precipitation_sum),
              mdnprecip=median(ifelse(precipitation_sum>0, precipitation_sum, NA), na.rm=TRUE),
              muprecip=mean(ifelse(precipitation_sum>0, precipitation_sum, NA), na.rm=TRUE),
              .groups="drop"
              ) %>% 
    select(tp, month, mdnprecip, muprecip) %>%
    pivot_longer(-c(tp, month)) %>%
    ggplot(aes(x=month)) + 
    geom_line(aes(y=value, group=tp, color=c("pre"="pre-2017", "post"="post-2017")[tp])) +
    facet_wrap(~c("mdnprecip"="Median (mm) on days with precipitation", 
                  "muprecip"="Mean (mm) on days with precipitation"
                  )[name]
               ) +
    labs(x=NULL, 
         title="Monthly precipitation on days with precipitation greater than zero\n(pre/post-2017)", 
         y="Precipitation (mm)"
         ) + 
    ylim(c(0, NA)) + 
    scale_color_discrete(NULL)
p5

The heaviest 3-day precipitation by year is explored:

p6 <- dfDailyATL %>%
    select(date, precipitation_sum) %>%
    mutate(year=year(date), 
           decade=round(year(date)-4.5, -1), 
           month=factor(month.abb[month(date)], levels=month.abb), 
           tp=ifelse(year<2017, "pre", "post")
    ) %>% 
    group_by(tp, year) %>% 
    helperRollingAgg(origVar="precipitation_sum", newName="r3precip", func=zoo::rollsum, k=3) %>%
    summarize(r3max=max(r3precip, na.rm=TRUE), .groups="drop") %>%
    ggplot(aes(x=year)) + 
    geom_line(aes(y=r3max, group=tp, color=c("pre"="pre-2017", "post"="post-2017")[tp])) +
    labs(x=NULL, 
         title="Maximum 3-day precipitation by year", 
         y="rolling 3-day sum of precipitation (mm)"
         ) + 
    geom_smooth(method="lm", 
                se=TRUE, 
                aes(y=r3max, group=tp, color=c("pre"="pre-2017", "post"="post-2017")[tp])
                ) +
    ylim(c(0, NA)) + 
    scale_color_discrete(NULL)
p6
## `geom_smooth()` using formula = 'y ~ x'

Long term daily data is downloaded for Detroit:

# Daily data download for Detroit, MI
testURLDaily <- helperOpenMeteoURL(cityName="Detroit MI", 
                                   dailyIndices=1:nrow(tblMetricsDaily),
                                   startDate="1960-01-01", 
                                   endDate="2023-12-31", 
                                   tz="US/Eastern"
                                   )
## 
## Daily metrics created from indices: weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration
testURLDaily
## [1] "https://archive-api.open-meteo.com/v1/archive?latitude=42.38&longitude=-83.1&start_date=1960-01-01&end_date=2023-12-31&daily=weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration&timezone=US%2FEastern"
# Download file
if(!file.exists("testOM_daily_dtw.json")) {
    fileDownload(fileName="testOM_daily_dtw.json", url=testURLDaily)
} else {
    cat("\nFile testOM_daily_dtw.json already exists, skipping download\n")
}
## 
## File testOM_daily_dtw.json already exists, skipping download

The daily dataset is loaded:

# Read daily JSON file
dtwOMDaily <- formatOpenMeteoJSON("testOM_daily_dtw.json")
## 
## Objects in JSON include: latitude, longitude, generationtime_ms, utc_offset_seconds, timezone, timezone_abbreviation, elevation, daily_units, daily 
## 
## $tblDaily
## # A tibble: 23,376 × 18
##    date       time       weathercode temperature_2m_max temperature_2m_min
##    <date>     <chr>            <int>              <dbl>              <dbl>
##  1 1960-01-01 1960-01-01           3                0.4               -4.4
##  2 1960-01-02 1960-01-02          55                3.2               -2.5
##  3 1960-01-03 1960-01-03          51                2.7               -2.9
##  4 1960-01-04 1960-01-04           3               -3.3               -8.4
##  5 1960-01-05 1960-01-05           3               -6                 -9.7
##  6 1960-01-06 1960-01-06           3               -0.8              -10.7
##  7 1960-01-07 1960-01-07           3                4.1               -4.9
##  8 1960-01-08 1960-01-08          51                2.5               -6.7
##  9 1960-01-09 1960-01-09          51                1.8               -7.3
## 10 1960-01-10 1960-01-10          51                4.7               -2  
## # ℹ 23,366 more rows
## # ℹ 13 more variables: apparent_temperature_max <dbl>,
## #   apparent_temperature_min <dbl>, precipitation_sum <dbl>, rain_sum <dbl>,
## #   snowfall_sum <dbl>, precipitation_hours <dbl>, sunrise <chr>, sunset <chr>,
## #   windspeed_10m_max <dbl>, windgusts_10m_max <dbl>,
## #   winddirection_10m_dominant <int>, shortwave_radiation_sum <dbl>,
## #   et0_fao_evapotranspiration <dbl>
## 
## $tblHourly
## NULL
## 
## $tblUnits
## # A tibble: 17 × 4
##    metricType  name                       value      description                
##    <chr>       <chr>                      <chr>      <chr>                      
##  1 daily_units time                       "iso8601"  <NA>                       
##  2 daily_units weathercode                "wmo code" The most severe weather co…
##  3 daily_units temperature_2m_max         "deg C"    Maximum and minimum daily …
##  4 daily_units temperature_2m_min         "deg C"    Maximum and minimum daily …
##  5 daily_units apparent_temperature_max   "deg C"    Maximum and minimum daily …
##  6 daily_units apparent_temperature_min   "deg C"    Maximum and minimum daily …
##  7 daily_units precipitation_sum          "mm"       Sum of daily precipitation…
##  8 daily_units rain_sum                   "mm"       Sum of daily rain          
##  9 daily_units snowfall_sum               "cm"       Sum of daily snowfall      
## 10 daily_units precipitation_hours        "h"        The number of hours with r…
## 11 daily_units sunrise                    "iso8601"  Sun rise and set times     
## 12 daily_units sunset                     "iso8601"  Sun rise and set times     
## 13 daily_units windspeed_10m_max          "km/h"     Maximum wind speed and gus…
## 14 daily_units windgusts_10m_max          "km/h"     Maximum wind speed and gus…
## 15 daily_units winddirection_10m_dominant "deg "     Dominant wind direction    
## 16 daily_units shortwave_radiation_sum    "MJ/m²"    The sum of solar radiaion …
## 17 daily_units et0_fao_evapotranspiration "mm"       Daily sum of ET0 Reference…
## 
## $tblDescription
## # A tibble: 1 × 7
##   latitude longitude generationtime_ms utc_offset_seconds timezone  
##      <dbl>     <dbl>             <dbl>              <int> <chr>     
## 1     42.4     -83.1              372.             -18000 US/Eastern
## # ℹ 2 more variables: timezone_abbreviation <chr>, elevation <dbl>
## 
## 
## latitude: 42.35501
## longitude: -83.13785
## generationtime_ms: 371.5783
## utc_offset_seconds: -18000
## timezone: US/Eastern
## timezone_abbreviation: GMT-5
## elevation: 199
# Sample records of tibble
dtwOMDaily$tblDaily %>% glimpse()
## Rows: 23,376
## Columns: 18
## $ date                       <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time                       <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode                <int> 3, 55, 51, 3, 3, 3, 3, 51, 51, 51, 3, 63, 6…
## $ temperature_2m_max         <dbl> 0.4, 3.2, 2.7, -3.3, -6.0, -0.8, 4.1, 2.5, …
## $ temperature_2m_min         <dbl> -4.4, -2.5, -2.9, -8.4, -9.7, -10.7, -4.9, …
## $ apparent_temperature_max   <dbl> -4.4, -1.1, -1.8, -9.2, -13.1, -7.0, -3.1, …
## $ apparent_temperature_min   <dbl> -8.1, -7.4, -8.9, -14.9, -15.2, -18.0, -9.6…
## $ precipitation_sum          <dbl> 0.0, 7.6, 0.4, 0.0, 0.0, 0.0, 0.0, 0.4, 0.3…
## $ rain_sum                   <dbl> 0.0, 7.6, 0.4, 0.0, 0.0, 0.0, 0.0, 0.4, 0.3…
## $ snowfall_sum               <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
## $ precipitation_hours        <dbl> 0, 17, 4, 0, 0, 0, 0, 4, 3, 1, 0, 17, 6, 4,…
## $ sunrise                    <chr> "1960-01-01T08:01", "1960-01-02T08:01", "19…
## $ sunset                     <chr> "1960-01-01T17:10", "1960-01-02T17:10", "19…
## $ windspeed_10m_max          <dbl> 15.0, 21.5, 25.6, 23.0, 28.7, 30.3, 30.9, 3…
## $ windgusts_10m_max          <dbl> 30.2, 42.5, 48.2, 45.0, 54.4, 54.4, 57.6, 6…
## $ winddirection_10m_dominant <int> 109, 196, 261, 247, 257, 239, 223, 272, 161…
## $ shortwave_radiation_sum    <dbl> 6.09, 0.93, 4.91, 4.91, 8.07, 6.47, 4.41, 6…
## $ et0_fao_evapotranspiration <dbl> 0.55, 0.33, 0.81, 0.96, 1.04, 1.07, 1.01, 1…

Variables are converted to proper data type:

dfDailyDTW <- dtwOMDaily$tblDaily %>%
    mutate(weathercode=factor(weathercode), 
           sunrise_chr=sunrise, 
           sunset_chr=sunset,
           sunrise=lubridate::ymd_hm(sunrise), 
           sunset=lubridate::ymd_hm(sunset), 
           fct_winddir=factor(winddirection_10m_dominant)
           )
glimpse(dfDailyDTW)
## Rows: 23,376
## Columns: 21
## $ date                       <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time                       <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode                <fct> 3, 55, 51, 3, 3, 3, 3, 51, 51, 51, 3, 63, 6…
## $ temperature_2m_max         <dbl> 0.4, 3.2, 2.7, -3.3, -6.0, -0.8, 4.1, 2.5, …
## $ temperature_2m_min         <dbl> -4.4, -2.5, -2.9, -8.4, -9.7, -10.7, -4.9, …
## $ apparent_temperature_max   <dbl> -4.4, -1.1, -1.8, -9.2, -13.1, -7.0, -3.1, …
## $ apparent_temperature_min   <dbl> -8.1, -7.4, -8.9, -14.9, -15.2, -18.0, -9.6…
## $ precipitation_sum          <dbl> 0.0, 7.6, 0.4, 0.0, 0.0, 0.0, 0.0, 0.4, 0.3…
## $ rain_sum                   <dbl> 0.0, 7.6, 0.4, 0.0, 0.0, 0.0, 0.0, 0.4, 0.3…
## $ snowfall_sum               <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
## $ precipitation_hours        <dbl> 0, 17, 4, 0, 0, 0, 0, 4, 3, 1, 0, 17, 6, 4,…
## $ sunrise                    <dttm> 1960-01-01 08:01:00, 1960-01-02 08:01:00, …
## $ sunset                     <dttm> 1960-01-01 17:10:00, 1960-01-02 17:10:00, …
## $ windspeed_10m_max          <dbl> 15.0, 21.5, 25.6, 23.0, 28.7, 30.3, 30.9, 3…
## $ windgusts_10m_max          <dbl> 30.2, 42.5, 48.2, 45.0, 54.4, 54.4, 57.6, 6…
## $ winddirection_10m_dominant <int> 109, 196, 261, 247, 257, 239, 223, 272, 161…
## $ shortwave_radiation_sum    <dbl> 6.09, 0.93, 4.91, 4.91, 8.07, 6.47, 4.41, 6…
## $ et0_fao_evapotranspiration <dbl> 0.55, 0.33, 0.81, 0.96, 1.04, 1.07, 1.01, 1…
## $ sunrise_chr                <chr> "1960-01-01T08:01", "1960-01-02T08:01", "19…
## $ sunset_chr                 <chr> "1960-01-01T17:10", "1960-01-02T17:10", "19…
## $ fct_winddir                <fct> 109, 196, 261, 247, 257, 239, 223, 272, 161…

Averages by month for select continuous variables are plotted:

tmpMapNames <- c("precipitation_sum"="3. Precipitation (mm)", 
                 "windspeed_10m_max"="4. Windspeed (kph)", 
                 "temperature_2m_max"="1. High Temperature (C)",
                 "temperature_2m_min"="2. Low Temperature (C)"
                 )

dfDailyDTW %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(year, month, temperature_2m_max, temperature_2m_min, precipitation_sum, windspeed_10m_max) %>%
    group_by(year, month) %>%
    summarize(across(-c(precipitation_sum), .fns=mean), 
              across(c(precipitation_sum), .fns=sum), 
              .groups="drop"
              ) %>%
    group_by(month) %>%
    summarize(across(-c(year), .fns=mean)) %>%
    pivot_longer(-c(month)) %>%
    ggplot(aes(x=month, y=value)) + 
    geom_line(aes(group=1)) + 
    facet_wrap(~tmpMapNames[name], scales="free_y") + 
    labs(x=NULL, y=NULL, title="Monthly averages for key metrics (Detroit 1960-2023)")

Averages by month for select categorical variables are plotted:

tmpMapNames <- c("wc"="1. Weather Type", 
                 "winddir"="2. Predominant Wind Direction"
                 )

tmpDFPlot <- dfDailyDTW %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), 
           year=year(date), 
           winddir=case_when(winddirection_10m_dominant>360~"Invalid", 
                             winddirection_10m_dominant>=315~"1. N", 
                             winddirection_10m_dominant>=225~"2. W", 
                             winddirection_10m_dominant>=135~"3. S", 
                             winddirection_10m_dominant>=45~"4. E", 
                             winddirection_10m_dominant>=0~"1. N", 
                             TRUE~"Invalid"
                             ), 
           wc=case_when(weathercode==0~"1. Clear", 
                        weathercode %in% c(1, 2, 3)~"2. Dry", 
                        weathercode %in% c(51, 53, 55)~"3. Drizzle", 
                        weathercode %in% c(61, 63, 65)~"4. Rain", 
                        weathercode %in% c(71, 73, 75)~"5. Snow", 
                        TRUE~"Error"
                        )
           ) %>%
    select(month, wc, winddir) %>%
    pivot_longer(-c(month)) %>%
    count(month, name, value)

tmpPlotFN <- function(x) {
    p1 <- tmpDFPlot %>%
        filter(name==x) %>%
        ggplot(aes(x=month, y=n)) + 
        geom_line(aes(group=value, color=value), lwd=2) + 
        labs(x=NULL, 
             y=NULL, 
             title=paste0(tmpMapNames[x], " (Detroit 1960-2023)")
             ) + 
        scale_color_discrete(NULL) + 
        theme(legend.position = "bottom")
    return(p1)
    }

gridExtra::grid.arrange(tmpPlotFN("wc"), tmpPlotFN("winddir"), nrow=1)

Averages by year for select continuous variables are plotted:

tmpMapNames <- c("precipitation_sum"="3. Precipitation (mm)", 
                 "windspeed_10m_max"="4. Windspeed (kph)", 
                 "temperature_2m_max"="1. High Temperature (C)",
                 "temperature_2m_min"="2. Low Temperature (C)"
                 )

dfDailyDTW %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(year, temperature_2m_max, temperature_2m_min, precipitation_sum, windspeed_10m_max) %>%
    group_by(year) %>%
    summarize(across(-c(precipitation_sum), .fns=mean), 
              across(c(precipitation_sum), .fns=sum), 
              .groups="drop"
              ) %>%
    pivot_longer(-c(year)) %>%
    ggplot(aes(x=year, y=value)) + 
    geom_line(aes(group=1)) + 
    facet_wrap(~tmpMapNames[name], scales="free_y") + 
    labs(x=NULL, y=NULL, title="Annual averages for key metrics (Detroit 1960-2023)")

Averages by year for select categorical variables are plotted:

tmpMapNames <- c("wc"="1. Weather Type", 
                 "winddir"="2. Predominant Wind Direction"
                 )

tmpDFPlot <- dfDailyDTW %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), 
           year=year(date), 
           winddir=case_when(winddirection_10m_dominant>360~"Invalid", 
                             winddirection_10m_dominant>=315~"1. N", 
                             winddirection_10m_dominant>=225~"2. W", 
                             winddirection_10m_dominant>=135~"3. S", 
                             winddirection_10m_dominant>=45~"4. E", 
                             winddirection_10m_dominant>=0~"1. N", 
                             TRUE~"Invalid"
                             ), 
           wc=case_when(weathercode==0~"1. Clear", 
                        weathercode %in% c(1, 2, 3)~"2. Dry", 
                        weathercode %in% c(51, 53, 55)~"3. Drizzle", 
                        weathercode %in% c(61, 63, 65)~"4. Rain", 
                        weathercode %in% c(71, 73, 75)~"5. Snow", 
                        TRUE~"Error"
                        )
           ) %>%
    select(year, wc, winddir) %>%
    pivot_longer(-c(year)) %>%
    count(year, name, value)

tmpPlotFN <- function(x) {
    p1 <- tmpDFPlot %>%
        filter(name==x) %>%
        ggplot(aes(x=year, y=n)) + 
        geom_line(aes(group=value, color=value), lwd=1) + 
        labs(x=NULL, 
             y=NULL, 
             title=paste0(tmpMapNames[x], " (Detroit 1960-2023)")
             ) + 
        scale_color_discrete(NULL) + 
        theme(legend.position = "bottom")
    return(p1)
    }

gridExtra::grid.arrange(tmpPlotFN("wc"), tmpPlotFN("winddir"), nrow=1)

The apparent decrease in days with northerly winds is explored further:

dfDailyWind <- dfDailyDTW %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), 
           year=year(date), 
           winddir=case_when(winddirection_10m_dominant>360~"Invalid", 
                             winddirection_10m_dominant>=300~"1. N", 
                             winddirection_10m_dominant>240~"2. E/W", 
                             winddirection_10m_dominant>=120~"3. S", 
                             winddirection_10m_dominant>60~"2. E/W", 
                             winddirection_10m_dominant>=0~"1. N", 
                             TRUE~"Invalid"
                             )
           ) %>%
    select(year, month, date, winddir, winddirection_10m_dominant, windspeed_10m_max)
dfDailyWind
## # A tibble: 23,376 × 6
##     year month date       winddir winddirection_10m_dominant windspeed_10m_max
##    <dbl> <fct> <date>     <chr>                        <int>             <dbl>
##  1  1960 Jan   1960-01-01 2. E/W                         109              15  
##  2  1960 Jan   1960-01-02 3. S                           196              21.5
##  3  1960 Jan   1960-01-03 2. E/W                         261              25.6
##  4  1960 Jan   1960-01-04 2. E/W                         247              23  
##  5  1960 Jan   1960-01-05 2. E/W                         257              28.7
##  6  1960 Jan   1960-01-06 3. S                           239              30.3
##  7  1960 Jan   1960-01-07 3. S                           223              30.9
##  8  1960 Jan   1960-01-08 2. E/W                         272              32.8
##  9  1960 Jan   1960-01-09 3. S                           161              18.4
## 10  1960 Jan   1960-01-10 2. E/W                         284              10.2
## # ℹ 23,366 more rows
dfDailyWind %>%
    count(year, winddir) %>%
    group_by(year) %>%
    mutate(pct=n/sum(n)) %>%
    ungroup() %>%
    ggplot(aes(x=year, y=pct)) + 
    geom_line(aes(group=winddir, color=winddir), lwd=1) + 
    labs(x=NULL, 
         y="Percent of Days", 
         title=paste0("Predominant Wind Direction", " (Detroit 1960-2023)"), 
         caption="N (300-060)\nE/W (061-119, 241-299)\nS (120-240)"
         ) + 
    scale_color_discrete(NULL) + 
    geom_hline(data=~summarize(group_by(., winddir), pct=mean(pct)), 
               aes(color=winddir, group=winddir, yintercept=pct), 
               lty=2
               ) +
    lims(y=c(0, NA)) +
    theme(legend.position = "bottom")

Boxplots for maximum windspeed are created:

dfDailyDTW %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(month, windspeed_10m_max) %>%
    ggplot(aes(x=month, y=windspeed_10m_max)) + 
    geom_boxplot(fill="lightblue") + 
    lims(y=c(0, NA)) +
    labs(x=NULL, 
         y="Maximum daily wind speed (kph)", 
         title="Maximum daily windspeed by month (Detroit 1960-2023)"
         )

Boxplots for precipitation are created:

dfDailyDTW %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(month, precipitation_hours) %>%
    ggplot(aes(x=month, y=precipitation_hours)) + 
    geom_boxplot(fill="lightblue") + 
    lims(y=c(0, NA)) +
    labs(x=NULL, 
         y="Daily precipitation hours", 
         title="Daily precipitation hours by month (Detroit 1960-2023)"
         )

dfDailyDTW %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(month, precipitation_sum) %>%
    ggplot(aes(x=month, y=precipitation_sum)) + 
    geom_boxplot(fill="lightblue") + 
    lims(y=c(0, NA)) +
    labs(x=NULL, 
         y="Daily precipitation (mm)", 
         title="Daily precipitation by month (Detroit 1960-2023)"
         )

Boxplots for temperature are created:

dfDailyDTW %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(month, temperature_2m_max) %>%
    ggplot(aes(x=month, y=temperature_2m_max)) + 
    geom_boxplot(fill="lightblue") + 
    labs(x=NULL, 
         y="Daily high temperature (C)", 
         title="Daily high temperature by month (Detroit 1960-2023)"
         )

dfDailyDTW %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(month, temperature_2m_min) %>%
    ggplot(aes(x=month, y=temperature_2m_min)) + 
    geom_boxplot(fill="lightblue") + 
    labs(x=NULL, 
         y="Daily low temperature (C)", 
         title="Daily low temperature by month (Detroit 1960-2023)"
         )

ACF and PACF are explored for maximum temperature:

acfTemp <- acf(dfDailyDTW$temperature_2m_max, lag.max=1000)

as.vector(acfTemp$acf) %>% findPeaks(width=21) %>% which()
## [1] 367 728
as.vector(acfTemp$acf) %>% findPeaks(width=21, FUN=min) %>% which()
## [1] 184 551 915
pacfTemp <- pacf(dfDailyDTW$temperature_2m_max, lag.max=1000)

pacf(dfDailyDTW$temperature_2m_max, lag.max=50)

As expected, ACF has a sustained seasonal pattern, with peaks (at roughly intervals of 365) and valleys (at roughly intervals of 365 offset by roughly 185) corresponding to the 365-day year

ACF and PACF are explored for precipitation:

acfPrecip <- acf(dfDailyDTW$precipitation_sum, lag.max=1000)

acf(dfDailyDTW$precipitation_sum, lag.max=50)

as.vector(acfPrecip$acf) %>% findPeaks(width=21) %>% which()
##  [1]  30  56  77 101 118 147 179 197 213 250 266 301 327 358 371 400 428 442 454
## [20] 469 489 508 520 541 553 573 588 605 618 631 644 657 678 690 727 746 769 790
## [39] 820 847 859 879 898 918 929 943 958 974
as.vector(acfPrecip$acf) %>% findPeaks(width=21, FUN=min) %>% which()
##  [1]  18  32  60  86 114 137 162 189 209 247 261 280 296 325 340 354 386 410 426
## [20] 451 462 494 515 535 550 568 583 603 615 652 680 708 725 742 771 806 831 852
## [39] 875 907 922 934 956 977
pacfPrecip <- pacf(dfDailyDTW$precipitation_sum, lag.max=1000)

pacf(dfDailyDTW$precipitation_sum, lag.max=50)

Precipitation by contrast has little seasonality and mostly just a correlation to the previous day’s value

ACF and PACF are explored for wind speed:

acfWind <- acf(dfDailyDTW$windspeed_10m_max, lag.max=1000)

acf(dfDailyDTW$windspeed_10m_max, lag.max=50)

as.vector(acfWind$acf) %>% findPeaks(width=21) %>% which()
##  [1]  28  41  73 142 167 182 202 228 239 281 316 330 354 374 408 476 490 523 537
## [20] 564 620 695 717 737 834 852 873 900 952 977
as.vector(acfWind$acf) %>% findPeaks(width=21, FUN=min) %>% which()
##  [1]  31  87 136 154 169 191 210 253 320 356 376 404 451 483 502 518 556 575 588
## [20] 600 753 839 869 916 935 948
pacfWind <- pacf(dfDailyDTW$windspeed_10m_max, lag.max=1000)

pacf(dfDailyDTW$windspeed_10m_max, lag.max=50)

Similar to temperature, wind speed appears to have a sustained seasonal component, though peak correlations are much lower in magnitude (~0.1 for wind speed vs. ~0.8 for temperature)

A boxplot for delta daily temperature is created:

dfDailyDTW %>%
    mutate(month=factor(month.abb[month(date)], levels=month.abb), 
           chg=temperature_2m_max-lag(temperature_2m_max)
           ) %>%
    filter(!is.na(chg)) %>%
    ggplot(aes(x=month, y=chg)) + 
    geom_boxplot() + 
    labs(x=NULL, y="Daily change in maximum temperature")

Daily temperature changes are generally larger in magnitude during the colder season

A boxplot for delta daily wind speed is created:

dfDailyDTW %>%
    mutate(month=factor(month.abb[month(date)], levels=month.abb), 
           chg=windspeed_10m_max-lag(windspeed_10m_max)
           ) %>%
    filter(!is.na(chg)) %>%
    ggplot(aes(x=month, y=chg)) + 
    geom_boxplot() + 
    labs(x=NULL, y="Daily change in maximum wind speed")

Daily wind speed changes are generally larger in magnitude during the colder season

A boxplot for delta daily precipitation is created:

dfDailyDTW %>%
    mutate(month=factor(month.abb[month(date)], levels=month.abb), 
           chg=precipitation_sum-lag(precipitation_sum)
           ) %>%
    filter(!is.na(chg)) %>%
    ggplot(aes(x=month, y=chg)) + 
    geom_boxplot() + 
    labs(x=NULL, y="Daily change in precipitation")

Most days have no precipitation. The corresponding boxplot for precipitation change has small boxes and whiskers with many outliers

Averages for temperature, precipitation, and wind speed are calculated by day of year, using a 21-day rolling window:

df_r21 <- dfDailyDTW %>%
    mutate(doy=pmin(yday(date), 365)) %>% 
    helperRollingAgg(origVar="temperature_2m_max", newName="temp_r21", k=21) %>% 
    helperRollingAgg(origVar="windspeed_10m_max", newName="wind_r21", k=21) %>% 
    helperRollingAgg(origVar="precipitation_sum", newName="precip_r21", k=21) %>%
    select(date, doy, contains("_r21"))

df_r21 %>%
    na.omit() %>%
    group_by(doy) %>%
    summarize(across(where(is.numeric), .fns=mean), n=n()) %>%
    pivot_longer(cols=-c(doy)) %>%
    ggplot(aes(x=doy, y=value)) + 
    geom_line(aes(group=name, color=name)) + 
    facet_wrap(~name, scales="free_y") +
    labs(x="Day of Year", y="Rolling-21 mean") + 
    theme(legend.position="none")

Outliers are likely much more important in driving average precipitation than in driving average temperature

Standard deviations for temperature, precipitation, and wind speed are calculated by day of year, using a 21-day rolling window:

df_r21_sd <- dfDailyDTW %>%
    mutate(doy=pmin(yday(date), 365)) %>% 
    group_by(doy) %>%
    mutate(across(.cols=c(temperature_2m_max, windspeed_10m_max, precipitation_sum), .fns=sd)) %>%
    ungroup() %>%
    helperRollingAgg(origVar="temperature_2m_max", newName="temp_r21", k=21) %>% 
    helperRollingAgg(origVar="windspeed_10m_max", newName="wind_r21", k=21) %>% 
    helperRollingAgg(origVar="precipitation_sum", newName="precip_r21", k=21) %>%
    select(date, doy, contains("_r21"))

df_r21_sd %>%
    na.omit() %>%
    group_by(doy) %>%
    summarize(across(where(is.numeric), .fns=mean)) %>%
    pivot_longer(cols=-c(doy)) %>%
    ggplot(aes(x=doy, y=value)) + 
    geom_line(aes(group=name, color=name)) + 
    facet_wrap(~name, scales="free_y") +
    labs(x="Day of Year", y="Rolling-21 mean of daily standard deviation") + 
    theme(legend.position="none")

Means and standard deviations are plotted together:

df_r21 %>%
    bind_rows(df_r21_sd, .id="src") %>%
    na.omit() %>%
    mutate(musig=c("1"="Mean", "2"="SD")[src]) %>%
    group_by(doy, musig) %>%
    summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
    pivot_longer(cols=-c(doy, musig)) %>%
    pivot_wider(id_cols=c(doy, name), names_from="musig") %>%
    ggplot(aes(x=doy)) + 
    geom_line(aes(y=Mean, group=name, color=name), lwd=2) + 
    geom_ribbon(aes(ymin=Mean-SD, ymax=Mean+SD, fill=name), alpha=0.5) +
    facet_wrap(~name, scales="free_y") +
    labs(x="Day of Year", 
         y="Rolling-21 mean +/- daily standard deviation", 
         title="Rolling 21-day mean +/- one rolling 21-day sd"
         ) + 
    theme(legend.position="none")

Means and approximate SEM are plotted together:

nYears <- length(unique(year(df_r21$date)))
nYears
## [1] 64
df_r21 %>%
    bind_rows(df_r21_sd, .id="src") %>%
    na.omit() %>%
    mutate(musig=c("1"="Mean", "2"="SD")[src]) %>%
    group_by(doy, musig) %>%
    summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
    pivot_longer(cols=-c(doy, musig)) %>%
    pivot_wider(id_cols=c(doy, name), names_from="musig") %>%
    ggplot(aes(x=doy)) + 
    geom_line(aes(y=Mean, group=name, color=name), lwd=2) + 
    geom_ribbon(aes(ymin=Mean-SD/sqrt(nYears-1), ymax=Mean+SD/sqrt(nYears-1), fill=name), alpha=0.5) +
    facet_wrap(~name, scales="free_y") +
    labs(x="Day of Year", 
         y="Rolling-21 mean +/- 1 SEM (approx)", 
         title="Rolling 21-day mean +/- 1 SEM (approx)"
         ) + 
    theme(legend.position="none")

Consistent with previous analyses, temperature has a strong seasonal pattern with differences in mean that vastly exceed the standard error of the mean. By contrast, while there is apparent seasonal spikiness to precipitation, rolling 21-day means are usually within one standard error of the mean of each other. Wind is more similar to temperature in having seasonal variations that meaningfully exceed SEM in magnitude.

Percentage of cumulative precipitation by volume of daily precipitation is explored:

tmpPlotData <- dfDailyDTW %>%
    select(date, precipitation_sum) %>%
    group_by(precipitation_sum) %>%
    summarize(n=n(), precip=sum(precipitation_sum), .groups="drop") %>%
    mutate(csp=cumsum(precip), cpp=csp/sum(precip), csn=cumsum(n), cpn=csn/sum(n)) 

tmpPlotData %>%
    pivot_longer(cols=-c(precipitation_sum)) %>%
    ggplot(aes(x=precipitation_sum, y=value)) + 
    geom_line(data=~filter(., name %in% c("cpp", "cpn")), 
              aes(group=name, color=c("cpn"="# events", "cpp"="precip")[name])
              ) + 
    labs(x="Daily precipitation (mm)", y="% total (cumul)") + 
    scale_color_discrete("Metric")

Around half of days have no precipitation. Around 25% of total precipitation comes from the rare day with 15+ mm (above about half an inch) of precipitation

Total precipitation by decade is also explored, sorted by daily precipitation:

dfDailyDTW %>%
    select(date, precipitation_sum) %>%
    mutate(year=year(date), decade=round(year(date)-4.5, -1)) %>%
    group_by(decade, precipitation_sum) %>%
    summarize(n=n(), precip=sum(precipitation_sum), .groups="drop") %>%
    group_by(decade) %>%
    mutate(csp=cumsum(precip), ntot=sum(n), meancsp=365*csp/ntot) %>%
    ungroup() %>%
    ggplot(aes(x=precipitation_sum, y=meancsp)) + 
    geom_line(aes(group=factor(decade), color=factor(decade)), lwd=1) + 
    labs(x="Daily precipitation (mm)", 
         y="Cumulative annual precipitation (mm)", 
         title="Average annual precipitation by decade", 
         subtitle="Cumulative, for daily precipitation amounts LTE x-axis"
         ) + 
    scale_color_discrete("Decade")

Trends look similar by decade for precipitation amounts under 0.5 inch (12.5 mm). Heavier precipitation amounts appear to be more prevalent in the 2020s and lightest in the 1960s, driving differences in annual precipitation

Long term daily data is downloaded for Chicago:

# Daily data download for Chicago, IL
testURLDaily <- helperOpenMeteoURL(cityName="Chicago IL", 
                                   dailyIndices=1:nrow(tblMetricsDaily),
                                   startDate="1960-01-01", 
                                   endDate="2023-12-31", 
                                   tz="US/Central"
                                   )
## 
## Daily metrics created from indices: weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration
testURLDaily
## [1] "https://archive-api.open-meteo.com/v1/archive?latitude=41.84&longitude=-87.68&start_date=1960-01-01&end_date=2023-12-31&daily=weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration&timezone=US%2FCentral"
# Download file
if(!file.exists("testOM_daily_ord.json")) {
    fileDownload(fileName="testOM_daily_ord.json", url=testURLDaily)
} else {
    cat("\nFile testOM_daily_ord.json already exists, skipping download\n")
}
## 
## File testOM_daily_ord.json already exists, skipping download

The daily dataset is loaded:

# Read daily JSON file
ordOMDaily <- formatOpenMeteoJSON("testOM_daily_ord.json")
## 
## Objects in JSON include: latitude, longitude, generationtime_ms, utc_offset_seconds, timezone, timezone_abbreviation, elevation, daily_units, daily 
## 
## $tblDaily
## # A tibble: 23,376 × 18
##    date       time       weathercode temperature_2m_max temperature_2m_min
##    <date>     <chr>            <int>              <dbl>              <dbl>
##  1 1960-01-01 1960-01-01           3                1.6               -3.5
##  2 1960-01-02 1960-01-02          51                4.9               -0.9
##  3 1960-01-03 1960-01-03           3               -0.6               -7.5
##  4 1960-01-04 1960-01-04           2               -5.9              -11.8
##  5 1960-01-05 1960-01-05           3               -6.1              -11.2
##  6 1960-01-06 1960-01-06           1                1                -10.3
##  7 1960-01-07 1960-01-07           3                4.9               -3.2
##  8 1960-01-08 1960-01-08           3                1.6               -2.8
##  9 1960-01-09 1960-01-09          51                7.1               -3.4
## 10 1960-01-10 1960-01-10          51                4.2               -1  
## # ℹ 23,366 more rows
## # ℹ 13 more variables: apparent_temperature_max <dbl>,
## #   apparent_temperature_min <dbl>, precipitation_sum <dbl>, rain_sum <dbl>,
## #   snowfall_sum <dbl>, precipitation_hours <dbl>, sunrise <chr>, sunset <chr>,
## #   windspeed_10m_max <dbl>, windgusts_10m_max <dbl>,
## #   winddirection_10m_dominant <int>, shortwave_radiation_sum <dbl>,
## #   et0_fao_evapotranspiration <dbl>
## 
## $tblHourly
## NULL
## 
## $tblUnits
## # A tibble: 17 × 4
##    metricType  name                       value      description                
##    <chr>       <chr>                      <chr>      <chr>                      
##  1 daily_units time                       "iso8601"  <NA>                       
##  2 daily_units weathercode                "wmo code" The most severe weather co…
##  3 daily_units temperature_2m_max         "deg C"    Maximum and minimum daily …
##  4 daily_units temperature_2m_min         "deg C"    Maximum and minimum daily …
##  5 daily_units apparent_temperature_max   "deg C"    Maximum and minimum daily …
##  6 daily_units apparent_temperature_min   "deg C"    Maximum and minimum daily …
##  7 daily_units precipitation_sum          "mm"       Sum of daily precipitation…
##  8 daily_units rain_sum                   "mm"       Sum of daily rain          
##  9 daily_units snowfall_sum               "cm"       Sum of daily snowfall      
## 10 daily_units precipitation_hours        "h"        The number of hours with r…
## 11 daily_units sunrise                    "iso8601"  Sun rise and set times     
## 12 daily_units sunset                     "iso8601"  Sun rise and set times     
## 13 daily_units windspeed_10m_max          "km/h"     Maximum wind speed and gus…
## 14 daily_units windgusts_10m_max          "km/h"     Maximum wind speed and gus…
## 15 daily_units winddirection_10m_dominant "deg "     Dominant wind direction    
## 16 daily_units shortwave_radiation_sum    "MJ/m²"    The sum of solar radiaion …
## 17 daily_units et0_fao_evapotranspiration "mm"       Daily sum of ET0 Reference…
## 
## $tblDescription
## # A tibble: 1 × 7
##   latitude longitude generationtime_ms utc_offset_seconds timezone  
##      <dbl>     <dbl>             <dbl>              <int> <chr>     
## 1     41.9     -87.6              369.             -21600 US/Central
## # ℹ 2 more variables: timezone_abbreviation <chr>, elevation <dbl>
## 
## 
## latitude: 41.86292
## longitude: -87.64877
## generationtime_ms: 368.9336
## utc_offset_seconds: -21600
## timezone: US/Central
## timezone_abbreviation: GMT-6
## elevation: 180
# Sample records of tibble
ordOMDaily$tblDaily %>% glimpse()
## Rows: 23,376
## Columns: 18
## $ date                       <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time                       <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode                <int> 3, 51, 3, 2, 3, 1, 3, 3, 51, 51, 61, 63, 53…
## $ temperature_2m_max         <dbl> 1.6, 4.9, -0.6, -5.9, -6.1, 1.0, 4.9, 1.6, …
## $ temperature_2m_min         <dbl> -3.5, -0.9, -7.5, -11.8, -11.2, -10.3, -3.2…
## $ apparent_temperature_max   <dbl> -4.3, -0.6, -6.9, -13.1, -13.6, -4.4, -1.3,…
## $ apparent_temperature_min   <dbl> -7.9, -7.0, -14.4, -18.4, -17.7, -17.9, -8.…
## $ precipitation_sum          <dbl> 0.0, 1.7, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.6…
## $ rain_sum                   <dbl> 0.0, 1.7, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.6…
## $ snowfall_sum               <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
## $ precipitation_hours        <dbl> 0, 11, 0, 0, 0, 0, 0, 0, 6, 1, 5, 24, 5, 10…
## $ sunrise                    <chr> "1960-01-01T07:18", "1960-01-02T07:18", "19…
## $ sunset                     <chr> "1960-01-01T16:29", "1960-01-02T16:30", "19…
## $ windspeed_10m_max          <dbl> 22.7, 27.6, 27.0, 29.1, 29.6, 32.1, 32.7, 3…
## $ windgusts_10m_max          <dbl> 40.0, 52.6, 44.6, 50.8, 48.2, 52.6, 56.5, 5…
## $ winddirection_10m_dominant <int> 142, 214, 268, 247, 261, 232, 234, 275, 185…
## $ shortwave_radiation_sum    <dbl> 7.45, 2.25, 4.58, 8.66, 9.09, 8.79, 5.86, 8…
## $ et0_fao_evapotranspiration <dbl> 0.95, 0.66, 1.06, 1.06, 1.04, 1.33, 1.23, 1…

Variables are converted to proper data type:

dfDailyORD <- ordOMDaily$tblDaily %>%
    mutate(weathercode=factor(weathercode), 
           sunrise_chr=sunrise, 
           sunset_chr=sunset,
           sunrise=lubridate::ymd_hm(sunrise), 
           sunset=lubridate::ymd_hm(sunset), 
           fct_winddir=factor(winddirection_10m_dominant)
           )
glimpse(dfDailyORD)
## Rows: 23,376
## Columns: 21
## $ date                       <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time                       <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode                <fct> 3, 51, 3, 2, 3, 1, 3, 3, 51, 51, 61, 63, 53…
## $ temperature_2m_max         <dbl> 1.6, 4.9, -0.6, -5.9, -6.1, 1.0, 4.9, 1.6, …
## $ temperature_2m_min         <dbl> -3.5, -0.9, -7.5, -11.8, -11.2, -10.3, -3.2…
## $ apparent_temperature_max   <dbl> -4.3, -0.6, -6.9, -13.1, -13.6, -4.4, -1.3,…
## $ apparent_temperature_min   <dbl> -7.9, -7.0, -14.4, -18.4, -17.7, -17.9, -8.…
## $ precipitation_sum          <dbl> 0.0, 1.7, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.6…
## $ rain_sum                   <dbl> 0.0, 1.7, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.6…
## $ snowfall_sum               <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
## $ precipitation_hours        <dbl> 0, 11, 0, 0, 0, 0, 0, 0, 6, 1, 5, 24, 5, 10…
## $ sunrise                    <dttm> 1960-01-01 07:18:00, 1960-01-02 07:18:00, …
## $ sunset                     <dttm> 1960-01-01 16:29:00, 1960-01-02 16:30:00, …
## $ windspeed_10m_max          <dbl> 22.7, 27.6, 27.0, 29.1, 29.6, 32.1, 32.7, 3…
## $ windgusts_10m_max          <dbl> 40.0, 52.6, 44.6, 50.8, 48.2, 52.6, 56.5, 5…
## $ winddirection_10m_dominant <int> 142, 214, 268, 247, 261, 232, 234, 275, 185…
## $ shortwave_radiation_sum    <dbl> 7.45, 2.25, 4.58, 8.66, 9.09, 8.79, 5.86, 8…
## $ et0_fao_evapotranspiration <dbl> 0.95, 0.66, 1.06, 1.06, 1.04, 1.33, 1.23, 1…
## $ sunrise_chr                <chr> "1960-01-01T07:18", "1960-01-02T07:18", "19…
## $ sunset_chr                 <chr> "1960-01-01T16:29", "1960-01-02T16:30", "19…
## $ fct_winddir                <fct> 142, 214, 268, 247, 261, 232, 234, 275, 185…

Averages by month for select continuous variables are plotted:

tmpMapNames <- c("precipitation_sum"="3. Precipitation (mm)", 
                 "windspeed_10m_max"="4. Windspeed (kph)", 
                 "temperature_2m_max"="1. High Temperature (C)",
                 "temperature_2m_min"="2. Low Temperature (C)"
                 )

dfDailyORD %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(year, month, temperature_2m_max, temperature_2m_min, precipitation_sum, windspeed_10m_max) %>%
    group_by(year, month) %>%
    summarize(across(-c(precipitation_sum), .fns=mean), 
              across(c(precipitation_sum), .fns=sum), 
              .groups="drop"
              ) %>%
    group_by(month) %>%
    summarize(across(-c(year), .fns=mean)) %>%
    pivot_longer(-c(month)) %>%
    ggplot(aes(x=month, y=value)) + 
    geom_line(aes(group=1)) + 
    facet_wrap(~tmpMapNames[name], scales="free_y") + 
    labs(x=NULL, y=NULL, title="Monthly averages for key metrics (Chicago 1960-2023)")

Averages by month for select categorical variables are plotted:

tmpMapNames <- c("wc"="1. Weather Type", 
                 "winddir"="2. Predominant Wind Direction"
                 )

tmpDFPlot <- dfDailyORD %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), 
           year=year(date), 
           winddir=case_when(winddirection_10m_dominant>360~"Invalid", 
                             winddirection_10m_dominant>=315~"1. N", 
                             winddirection_10m_dominant>=225~"2. W", 
                             winddirection_10m_dominant>=135~"3. S", 
                             winddirection_10m_dominant>=45~"4. E", 
                             winddirection_10m_dominant>=0~"1. N", 
                             TRUE~"Invalid"
                             ), 
           wc=case_when(weathercode==0~"1. Clear", 
                        weathercode %in% c(1, 2, 3)~"2. Dry", 
                        weathercode %in% c(51, 53, 55)~"3. Drizzle", 
                        weathercode %in% c(61, 63, 65)~"4. Rain", 
                        weathercode %in% c(71, 73, 75)~"5. Snow", 
                        TRUE~"Error"
                        )
           ) %>%
    select(month, wc, winddir) %>%
    pivot_longer(-c(month)) %>%
    count(month, name, value)

tmpPlotFN <- function(x) {
    p1 <- tmpDFPlot %>%
        filter(name==x) %>%
        ggplot(aes(x=month, y=n)) + 
        geom_line(aes(group=value, color=value), lwd=2) + 
        labs(x=NULL, 
             y=NULL, 
             title=paste0(tmpMapNames[x], " (Chicago 1960-2023)")
             ) + 
        scale_color_discrete(NULL) + 
        theme(legend.position = "bottom")
    return(p1)
    }

gridExtra::grid.arrange(tmpPlotFN("wc"), tmpPlotFN("winddir"), nrow=1)

Averages by year for select continuous variables are plotted:

tmpMapNames <- c("precipitation_sum"="3. Precipitation (mm)", 
                 "windspeed_10m_max"="4. Windspeed (kph)", 
                 "temperature_2m_max"="1. High Temperature (C)",
                 "temperature_2m_min"="2. Low Temperature (C)"
                 )

dfDailyORD %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(year, temperature_2m_max, temperature_2m_min, precipitation_sum, windspeed_10m_max) %>%
    group_by(year) %>%
    summarize(across(-c(precipitation_sum), .fns=mean), 
              across(c(precipitation_sum), .fns=sum), 
              .groups="drop"
              ) %>%
    pivot_longer(-c(year)) %>%
    ggplot(aes(x=year, y=value)) + 
    geom_line(aes(group=1)) + 
    facet_wrap(~tmpMapNames[name], scales="free_y") + 
    labs(x=NULL, y=NULL, title="Annual averages for key metrics (Chicago 1960-2023)")

Averages by year for select categorical variables are plotted:

tmpMapNames <- c("wc"="1. Weather Type", 
                 "winddir"="2. Predominant Wind Direction"
                 )

tmpDFPlot <- dfDailyORD %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), 
           year=year(date), 
           winddir=case_when(winddirection_10m_dominant>360~"Invalid", 
                             winddirection_10m_dominant>=315~"1. N", 
                             winddirection_10m_dominant>=225~"2. W", 
                             winddirection_10m_dominant>=135~"3. S", 
                             winddirection_10m_dominant>=45~"4. E", 
                             winddirection_10m_dominant>=0~"1. N", 
                             TRUE~"Invalid"
                             ), 
           wc=case_when(weathercode==0~"1. Clear", 
                        weathercode %in% c(1, 2, 3)~"2. Dry", 
                        weathercode %in% c(51, 53, 55)~"3. Drizzle", 
                        weathercode %in% c(61, 63, 65)~"4. Rain", 
                        weathercode %in% c(71, 73, 75)~"5. Snow", 
                        TRUE~"Error"
                        )
           ) %>%
    select(year, wc, winddir) %>%
    pivot_longer(-c(year)) %>%
    count(year, name, value)

tmpPlotFN <- function(x) {
    p1 <- tmpDFPlot %>%
        filter(name==x) %>%
        ggplot(aes(x=year, y=n)) + 
        geom_line(aes(group=value, color=value), lwd=1) + 
        labs(x=NULL, 
             y=NULL, 
             title=paste0(tmpMapNames[x], " (Chicago 1960-2023)")
             ) + 
        scale_color_discrete(NULL) + 
        theme(legend.position = "bottom")
    return(p1)
    }

gridExtra::grid.arrange(tmpPlotFN("wc"), tmpPlotFN("winddir"), nrow=1)

Boxplots for maximum windspeed are created:

dfDailyORD %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(month, windspeed_10m_max) %>%
    ggplot(aes(x=month, y=windspeed_10m_max)) + 
    geom_boxplot(fill="lightblue") + 
    lims(y=c(0, NA)) +
    labs(x=NULL, 
         y="Maximum daily wind speed (kph)", 
         title="Maximum daily windspeed by month (Chicago 1960-2023)"
         )

Boxplots for precipitation are created:

dfDailyORD %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(month, precipitation_hours) %>%
    ggplot(aes(x=month, y=precipitation_hours)) + 
    geom_boxplot(fill="lightblue") + 
    lims(y=c(0, NA)) +
    labs(x=NULL, 
         y="Daily precipitation hours", 
         title="Daily precipitation hours by month (Chicago 1960-2023)"
         )

dfDailyORD %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(month, precipitation_sum) %>%
    ggplot(aes(x=month, y=precipitation_sum)) + 
    geom_boxplot(fill="lightblue") + 
    lims(y=c(0, NA)) +
    labs(x=NULL, 
         y="Daily precipitation (mm)", 
         title="Daily precipitation by month (Chicago 1960-2023)"
         )

Boxplots for snow are also created:

dfDailyORD %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(month, snowfall_sum) %>%
    ggplot(aes(x=month, y=snowfall_sum)) + 
    geom_boxplot(fill="lightblue") + 
    lims(y=c(0, NA)) +
    labs(x=NULL, 
         y="Daily snowfall (liquid equivalent mm)", 
         title="Daily snowfall by month (Chicago 1960-2023)"
         )

Boxplots for temperature are created:

dfDailyORD %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(month, temperature_2m_max) %>%
    ggplot(aes(x=month, y=temperature_2m_max)) + 
    geom_boxplot(fill="lightblue") + 
    labs(x=NULL, 
         y="Daily high temperature (C)", 
         title="Daily high temperature by month (Chicago 1960-2023)"
         )

dfDailyORD %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(month, temperature_2m_min) %>%
    ggplot(aes(x=month, y=temperature_2m_min)) + 
    geom_boxplot(fill="lightblue") + 
    labs(x=NULL, 
         y="Daily low temperature (C)", 
         title="Daily low temperature by month (Chicago 1960-2023)"
         )

ACF and PACF are explored for maximum temperature:

acfTemp <- acf(dfDailyORD$temperature_2m_max, lag.max=1000)

as.vector(acfTemp$acf) %>% findPeaks(width=21) %>% which()
## [1] 366 728
as.vector(acfTemp$acf) %>% findPeaks(width=21, FUN=min) %>% which()
## [1] 180 551 915
pacfTemp <- pacf(dfDailyORD$temperature_2m_max, lag.max=1000)

pacf(dfDailyORD$temperature_2m_max, lag.max=50)

As expected, ACF has a sustained seasonal pattern, with peaks (at roughly intervals of 365) and valleys (at roughly intervals of 365 offset by roughly 185) corresponding to the 365-day year

ACF and PACF are explored for precipitation:

acfPrecip <- acf(dfDailyORD$precipitation_sum, lag.max=1000)

acf(dfDailyORD$precipitation_sum, lag.max=50)

as.vector(acfPrecip$acf) %>% findPeaks(width=21) %>% which()
##  [1]  42  80  94 122 167 179 205 216 229 251 268 288 302 332 350 388 413 443 454
## [20] 469 492 528 570 599 612 633 654 690 730 758 770 789 808 820 845 865 881 915
## [39] 926 940 951 976
as.vector(acfPrecip$acf) %>% findPeaks(width=21, FUN=min) %>% which()
##  [1]  24  49  67  86 114 137 163 181 192 218 231 248 260 281 311 324 340 355 375
## [20] 400 430 451 462 482 521 535 549 560 583 597 616 631 662 687 713 736 757 776
## [39] 795 822 853 875 889 934 949 964 979
pacfPrecip <- pacf(dfDailyORD$precipitation_sum, lag.max=1000)

pacf(dfDailyORD$precipitation_sum, lag.max=50)

Precipitation by contrast has little seasonality and mostly just a slight correlation to the previous day’s value

ACF and PACF are explored for wind speed:

acfWind <- acf(dfDailyORD$windspeed_10m_max, lag.max=1000)

acf(dfDailyORD$windspeed_10m_max, lag.max=50)

as.vector(acfWind$acf) %>% findPeaks(width=21) %>% which()
##  [1] 149 167 182 207 228 239 353 373 408 476 523 538 554 572 594 620 722 737 852
## [20] 883 901 926 953
as.vector(acfWind$acf) %>% findPeaks(width=21, FUN=min) %>% which()
##  [1] 112 126 137 154 169 190 217 253 278 376 469 483 506 518 556 589 633 725 739
## [20] 753 847 862 879 916 939 959 976
pacfWind <- pacf(dfDailyORD$windspeed_10m_max, lag.max=1000)

pacf(dfDailyORD$windspeed_10m_max, lag.max=50)

Similar to temperature, wind speed appears to have a sustained seasonal component, though peak correlations are much lower in magnitude (~0.1 for wind speed vs. ~0.8 for temperature)

A boxplot for delta daily temperature is created:

dfDailyORD %>%
    mutate(month=factor(month.abb[month(date)], levels=month.abb), 
           chg=temperature_2m_max-lag(temperature_2m_max)
           ) %>%
    filter(!is.na(chg)) %>%
    ggplot(aes(x=month, y=chg)) + 
    geom_boxplot() + 
    labs(x=NULL, y="Daily change in maximum temperature")

Daily temperature changes are generally larger in magnitude during the colder season

A boxplot for delta daily wind speed is created:

dfDailyORD %>%
    mutate(month=factor(month.abb[month(date)], levels=month.abb), 
           chg=windspeed_10m_max-lag(windspeed_10m_max)
           ) %>%
    filter(!is.na(chg)) %>%
    ggplot(aes(x=month, y=chg)) + 
    geom_boxplot() + 
    labs(x=NULL, y="Daily change in maximum wind speed")

Daily wind speed changes are generally similar in magnitude by season

A boxplot for delta daily precipitation is created:

dfDailyORD %>%
    mutate(month=factor(month.abb[month(date)], levels=month.abb), 
           chg=precipitation_sum-lag(precipitation_sum)
           ) %>%
    filter(!is.na(chg)) %>%
    ggplot(aes(x=month, y=chg)) + 
    geom_boxplot() + 
    labs(x=NULL, y="Daily change in precipitation")

Most days have no precipitation. The corresponding boxplot for precipitation change has small boxes and whiskers with many outliers

Averages for temperature, precipitation, and wind speed are calculated by day of year, using a 21-day rolling window:

df_r21 <- dfDailyORD %>%
    mutate(doy=pmin(yday(date), 365)) %>% 
    helperRollingAgg(origVar="temperature_2m_max", newName="temp_r21", k=21) %>% 
    helperRollingAgg(origVar="windspeed_10m_max", newName="wind_r21", k=21) %>% 
    helperRollingAgg(origVar="precipitation_sum", newName="precip_r21", k=21) %>%
    select(date, doy, contains("_r21"))

df_r21 %>%
    na.omit() %>%
    group_by(doy) %>%
    summarize(across(where(is.numeric), .fns=mean), n=n()) %>%
    pivot_longer(cols=-c(doy)) %>%
    ggplot(aes(x=doy, y=value)) + 
    geom_line(aes(group=name, color=name)) + 
    facet_wrap(~name, scales="free_y") +
    labs(x="Day of Year", y="Rolling-21 mean") + 
    theme(legend.position="none")

Outliers are likely much more important in driving average precipitation than in driving average temperature

Standard deviations for temperature, precipitation, and wind speed are calculated by day of year, using a 21-day rolling window:

df_r21_sd <- dfDailyORD %>%
    mutate(doy=pmin(yday(date), 365)) %>% 
    group_by(doy) %>%
    mutate(across(.cols=c(temperature_2m_max, windspeed_10m_max, precipitation_sum), .fns=sd)) %>%
    ungroup() %>%
    helperRollingAgg(origVar="temperature_2m_max", newName="temp_r21", k=21) %>% 
    helperRollingAgg(origVar="windspeed_10m_max", newName="wind_r21", k=21) %>% 
    helperRollingAgg(origVar="precipitation_sum", newName="precip_r21", k=21) %>%
    select(date, doy, contains("_r21"))

df_r21_sd %>%
    na.omit() %>%
    group_by(doy) %>%
    summarize(across(where(is.numeric), .fns=mean)) %>%
    pivot_longer(cols=-c(doy)) %>%
    ggplot(aes(x=doy, y=value)) + 
    geom_line(aes(group=name, color=name)) + 
    facet_wrap(~name, scales="free_y") +
    labs(x="Day of Year", y="Rolling-21 mean of daily standard deviation") + 
    theme(legend.position="none")

Means and standard deviations are plotted together:

df_r21 %>%
    bind_rows(df_r21_sd, .id="src") %>%
    na.omit() %>%
    mutate(musig=c("1"="Mean", "2"="SD")[src]) %>%
    group_by(doy, musig) %>%
    summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
    pivot_longer(cols=-c(doy, musig)) %>%
    pivot_wider(id_cols=c(doy, name), names_from="musig") %>%
    ggplot(aes(x=doy)) + 
    geom_line(aes(y=Mean, group=name, color=name), lwd=2) + 
    geom_ribbon(aes(ymin=Mean-SD, ymax=Mean+SD, fill=name), alpha=0.5) +
    facet_wrap(~name, scales="free_y") +
    labs(x="Day of Year", 
         y="Rolling-21 mean +/- daily standard deviation", 
         title="Rolling 21-day mean +/- one rolling 21-day sd"
         ) + 
    theme(legend.position="none")

Means and approximate SEM are plotted together:

nYears <- length(unique(year(df_r21$date)))
nYears
## [1] 64
df_r21 %>%
    bind_rows(df_r21_sd, .id="src") %>%
    na.omit() %>%
    mutate(musig=c("1"="Mean", "2"="SD")[src]) %>%
    group_by(doy, musig) %>%
    summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
    pivot_longer(cols=-c(doy, musig)) %>%
    pivot_wider(id_cols=c(doy, name), names_from="musig") %>%
    ggplot(aes(x=doy)) + 
    geom_line(aes(y=Mean, group=name, color=name), lwd=2) + 
    geom_ribbon(aes(ymin=Mean-SD/sqrt(nYears-1), ymax=Mean+SD/sqrt(nYears-1), fill=name), alpha=0.5) +
    facet_wrap(~name, scales="free_y") +
    labs(x="Day of Year", 
         y="Rolling-21 mean +/- 1 SEM (approx)", 
         title="Rolling 21-day mean +/- 1 SEM (approx)"
         ) + 
    theme(legend.position="none")

Consistent with previous analyses, temperature has a strong seasonal pattern with differences in mean that vastly exceed the standard error of the mean. By contrast, while there is apparent seasonal spikiness to precipitation, rolling 21-day means are usually within one standard error of the mean of each other. Wind is more similar to temperature in having seasonal variations that meaningfully exceed SEM in magnitude.

Percentage of cumulative precipitation by volume of daily precipitation is explored:

tmpPlotData <- dfDailyORD %>%
    select(date, precipitation_sum) %>%
    group_by(precipitation_sum) %>%
    summarize(n=n(), precip=sum(precipitation_sum), .groups="drop") %>%
    mutate(csp=cumsum(precip), cpp=csp/sum(precip), csn=cumsum(n), cpn=csn/sum(n)) 

tmpPlotData %>%
    pivot_longer(cols=-c(precipitation_sum)) %>%
    ggplot(aes(x=precipitation_sum, y=value)) + 
    geom_line(data=~filter(., name %in% c("cpp", "cpn")), 
              aes(group=name, color=c("cpn"="# events", "cpp"="precip")[name])
              ) + 
    labs(x="Daily precipitation (mm)", y="% total (cumul)") + 
    scale_color_discrete("Metric")

Around half of days have no precipitation. Around 25% of total precipitation comes from the rare day with 20+ mm (nearly an inch) of precipitation

Total precipitation by decade is also explored, sorted by daily precipitation:

dfDailyORD %>%
    select(date, precipitation_sum) %>%
    mutate(year=year(date), decade=round(year(date)-4.5, -1)) %>%
    group_by(decade, precipitation_sum) %>%
    summarize(n=n(), precip=sum(precipitation_sum), .groups="drop") %>%
    group_by(decade) %>%
    mutate(csp=cumsum(precip), ntot=sum(n), meancsp=365*csp/ntot) %>%
    ungroup() %>%
    ggplot(aes(x=precipitation_sum, y=meancsp)) + 
    geom_line(aes(group=factor(decade), color=factor(decade)), lwd=1) + 
    labs(x="Daily precipitation (mm)", 
         y="Cumulative annual precipitation (mm)", 
         title="Average annual precipitation by decade", 
         subtitle="Cumulative, for daily precipitation amounts LTE x-axis"
         ) + 
    scale_color_discrete("Decade")

Trends look similar by decade for precipitation amounts under 0.5 inch (12.5 mm). Heavier precipitation amounts appear to be more prevalent in the 2010s/2020s and lightest in the 1960s, driving differences in annual precipitation

Averages for temperature, precipitation, and wind speed are calculated by day of year, using a 21-day rolling window, for Atlanta, Detroit, and Chicago:

df_r21 <- bind_rows(dfDailyATL, dfDailyORD, dfDailyDTW, .id="src") %>%
    mutate(doy=pmin(yday(date), 365), 
           src=c("1"="Atlanta", "2"="Chicago", "3"="Detroit")[src]
           ) %>% 
    group_by(src) %>%
    helperRollingAgg(origVar="temperature_2m_max", newName="temp_r21", k=21) %>% 
    helperRollingAgg(origVar="windspeed_10m_max", newName="wind_r21", k=21) %>% 
    helperRollingAgg(origVar="precipitation_sum", newName="precip_r21", k=21) %>%
    select(src, date, doy, contains("_r21")) %>%
    ungroup()

df_r21 %>%
    na.omit() %>%
    group_by(src, doy) %>%
    summarize(across(where(is.numeric), .fns=mean), n=n(), .groups="drop") %>%
    pivot_longer(cols=-c(src, doy)) %>%
    ggplot(aes(x=doy, y=value)) + 
    geom_line(aes(group=src, color=src), lwd=1) + 
    facet_wrap(~name, scales="free_y") +
    labs(x="Day of Year", y="Rolling-21 mean") + 
    theme(legend.position="bottom") + 
    scale_color_discrete(NULL)

Standard deviations for temperature, precipitation, and wind speed are calculated by day of year, using a 21-day rolling window:

df_r21_sd <- bind_rows(dfDailyATL, dfDailyORD, dfDailyDTW, .id="src") %>%
    mutate(doy=pmin(yday(date), 365), 
           src=c("1"="Atlanta", "2"="Chicago", "3"="Detroit")[src]
           ) %>% 
    group_by(src, doy) %>%
    mutate(across(.cols=c(temperature_2m_max, windspeed_10m_max, precipitation_sum), .fns=sd)) %>%
    group_by(src) %>%
    helperRollingAgg(origVar="temperature_2m_max", newName="temp_r21", k=21) %>% 
    helperRollingAgg(origVar="windspeed_10m_max", newName="wind_r21", k=21) %>% 
    helperRollingAgg(origVar="precipitation_sum", newName="precip_r21", k=21) %>%
    select(src, date, doy, contains("_r21")) %>%
    ungroup()

df_r21_sd %>%
    na.omit() %>%
    group_by(src, doy) %>%
    summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
    pivot_longer(cols=-c(src, doy)) %>%
    ggplot(aes(x=doy, y=value)) + 
    geom_line(aes(group=src, color=src), lwd=1) + 
    facet_wrap(~name, scales="free_y") +
    labs(x="Day of Year", y="Rolling-21 mean of daily standard deviation") + 
    theme(legend.position="bottom")

Means and approximate SEM are plotted together:

nYears <- length(unique(year(df_r21$date)))
nYears
## [1] 64
df_r21 %>%
    bind_rows(df_r21_sd, .id="typ") %>%
    na.omit() %>%
    mutate(musig=c("1"="Mean", "2"="SD")[typ]) %>%
    group_by(src, doy, musig) %>%
    summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
    pivot_longer(cols=-c(src, doy, musig)) %>%
    pivot_wider(id_cols=c(src, doy, name), names_from="musig") %>%
    ggplot(aes(x=doy)) + 
    geom_line(aes(y=Mean, group=src, color=src), lty=2) + 
    geom_ribbon(aes(ymin=Mean-SD/sqrt(nYears-1), ymax=Mean+SD/sqrt(nYears-1), fill=src), alpha=0.5) +
    facet_wrap(~name, scales="free_y") +
    labs(x="Day of Year", 
         y="Rolling-21 mean +/- 1 SEM (approx)", 
         title="Rolling 21-day mean +/- 1 SEM (approx)"
         ) + 
    theme(legend.position="bottom")

Percentage of cumulative precipitation by volume of daily precipitation is explored, for three cities:

tmpPlotData <- bind_rows(dfDailyATL, dfDailyORD, dfDailyDTW, .id="src") %>%
    mutate(src=c("1"="Atlanta", "2"="Chicago", "3"="Detroit")[src]) %>% 
    select(src, date, precipitation_sum) %>%
    group_by(src, precipitation_sum) %>%
    summarize(n=n(), precip=sum(precipitation_sum), .groups="drop") %>%
    group_by(src) %>%
    mutate(csp=cumsum(precip), cpp=csp/sum(precip), csn=cumsum(n), cpn=csn/sum(n)) %>%
    ungroup()

tmpPlotData %>%
    pivot_longer(cols=-c(src, precipitation_sum)) %>%
    filter(name %in% c("cpp", "cpn")) %>%
    ggplot(aes(x=precipitation_sum, y=value)) + 
    geom_line(aes(group=src, color=src)) + 
    labs(x="Daily precipitation (mm)", y="% total (cumul)") + 
    scale_color_discrete("Metric") + 
    facet_wrap(~c("cpn"="# events", "cpp"="precip")[name])

Around half of days have no precipitation. In all three cities, around 25% of total precipitation comes from the rare day with 15-20+ mm (half inch to nearly an inch) of precipitation

Frequency of precipitation by change in temperature is explored:

bind_rows(dfDailyATL, dfDailyORD, dfDailyDTW, .id="src") %>%
    mutate(src=c("1"="Atlanta", "2"="Chicago", "3"="Detroit")[src]) %>%
    select(src, date, precipitation_sum, temperature_2m_max) %>%
    group_by(src) %>%
    mutate(isPrecip=precipitation_sum>0, 
           dTemp=temperature_2m_max-lag(temperature_2m_max), 
           month=factor(month.abb[month(date)], levels=month.abb)
           ) %>%
    ungroup() %>%
    filter(complete.cases(.)) %>%
    mutate(dTempRound=autoRound(dTemp, rndTo=2.5), dTempRound=pmax(-10, pmin(10, dTempRound))) %>%
    group_by(src, dTempRound) %>%
    summarize(pctPrecip=mean(isPrecip), muPrecip=mean(precipitation_sum), n=n(), .groups="drop") %>%
    mutate(sdpctPrecip=sqrt(pctPrecip*(1-pctPrecip)/n)) %>%
    ggplot(aes(x=dTempRound)) + 
    geom_line(aes(y=pctPrecip, color=src, group=src)) + 
    geom_ribbon(aes(ymin=pctPrecip-sdpctPrecip, ymax=pctPrecip+sdpctPrecip, fill=src, group=src), alpha=0.5) +
    labs(x="Temperature Change from Yesterday (C)\n(rounded to nearest 2.5, capped at +/- 10)", 
         y="% Days with Precipitation > 0\n(mean +/- 1 SE)", 
         title="Frequency of precipitation by change in temperature"
         ) +
    lims(y=c(0, 1)) + 
    scale_fill_discrete(NULL) + 
    scale_color_discrete(guide="none")

Month is a possible confounder, since temperature change and precipitation both vary by season:

bind_rows(dfDailyATL, dfDailyORD, dfDailyDTW, .id="src") %>%
    mutate(src=c("1"="Atlanta", "2"="Chicago", "3"="Detroit")[src]) %>%
    select(src, date, precipitation_sum, temperature_2m_max) %>%
    group_by(src) %>%
    mutate(isPrecip=precipitation_sum>0, 
           dTemp=temperature_2m_max-lag(temperature_2m_max), 
           month=factor(month.abb[month(date)], levels=month.abb)
           ) %>%
    ungroup() %>%
    filter(complete.cases(.)) %>%
    mutate(dTempRound=autoRound(dTemp, rndTo=2.5), dTempRound=pmax(-10, pmin(10, dTempRound))) %>%
    group_by(src, month, dTempRound) %>%
    summarize(pctPrecip=mean(isPrecip), muPrecip=mean(precipitation_sum), n=n(), .groups="drop") %>%
    mutate(sdpctPrecip=sqrt(pctPrecip*(1-pctPrecip)/n)) %>%
    filter(n>=5) %>%
    ggplot(aes(x=dTempRound)) + 
    geom_line(aes(y=pctPrecip, color=src, group=src)) + 
    geom_ribbon(aes(ymin=pctPrecip-sdpctPrecip, ymax=pctPrecip+sdpctPrecip, fill=src, group=src), alpha=0.5) +
    facet_wrap(~month) +
    labs(x="Temperature Change from Yesterday (C)\n(rounded to nearest 2.5, capped at +/- 10)", 
         y="% Days with Precipitation > 0\n(mean +/- 1 SE)", 
         title="Frequency of precipitation by change in temperature"
         ) +
    lims(y=c(0, 1)) + 
    scale_fill_discrete(NULL) + 
    scale_color_discrete(guide="none")

At a glance, there appears to be some association between temperature change and frequency of precipitation, even after controlling for city and month

Long term daily data is downloaded for New Orleans:

# Daily data download for New Orleans, LA
testURLDaily <- helperOpenMeteoURL(cityName="New Orleans LA", 
                                   dailyIndices=1:nrow(tblMetricsDaily),
                                   startDate="1960-01-01", 
                                   endDate="2023-12-31", 
                                   tz="US/Central"
                                   )
## 
## Daily metrics created from indices: weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration
testURLDaily
## [1] "https://archive-api.open-meteo.com/v1/archive?latitude=30.07&longitude=-89.93&start_date=1960-01-01&end_date=2023-12-31&daily=weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration&timezone=US%2FCentral"
# Download file
if(!file.exists("testOM_daily_msy.json")) {
    fileDownload(fileName="testOM_daily_msy.json", url=testURLDaily)
} else {
    cat("\nFile testOM_daily_msy.json already exists, skipping download\n")
}
## 
## File testOM_daily_msy.json already exists, skipping download

The daily dataset is loaded:

# Read daily JSON file
msyOMDaily <- formatOpenMeteoJSON("testOM_daily_msy.json")
## 
## Objects in JSON include: latitude, longitude, generationtime_ms, utc_offset_seconds, timezone, timezone_abbreviation, elevation, daily_units, daily 
## 
## $tblDaily
## # A tibble: 23,376 × 18
##    date       time       weathercode temperature_2m_max temperature_2m_min
##    <date>     <chr>            <int>              <dbl>              <dbl>
##  1 1960-01-01 1960-01-01          61               15.5               10.9
##  2 1960-01-02 1960-01-02          63               19.1               12.3
##  3 1960-01-03 1960-01-03          51               17.2                8.6
##  4 1960-01-04 1960-01-04           3               12.1                6.4
##  5 1960-01-05 1960-01-05          51               17.9               11.3
##  6 1960-01-06 1960-01-06          63               18.2               10.3
##  7 1960-01-07 1960-01-07          51               10.3                6.1
##  8 1960-01-08 1960-01-08           3               11.7                4.6
##  9 1960-01-09 1960-01-09           3               15.8                7.5
## 10 1960-01-10 1960-01-10          51               19.1               12.8
## # ℹ 23,366 more rows
## # ℹ 13 more variables: apparent_temperature_max <dbl>,
## #   apparent_temperature_min <dbl>, precipitation_sum <dbl>, rain_sum <dbl>,
## #   snowfall_sum <dbl>, precipitation_hours <dbl>, sunrise <chr>, sunset <chr>,
## #   windspeed_10m_max <dbl>, windgusts_10m_max <dbl>,
## #   winddirection_10m_dominant <int>, shortwave_radiation_sum <dbl>,
## #   et0_fao_evapotranspiration <dbl>
## 
## $tblHourly
## NULL
## 
## $tblUnits
## # A tibble: 17 × 4
##    metricType  name                       value      description                
##    <chr>       <chr>                      <chr>      <chr>                      
##  1 daily_units time                       "iso8601"  <NA>                       
##  2 daily_units weathercode                "wmo code" The most severe weather co…
##  3 daily_units temperature_2m_max         "deg C"    Maximum and minimum daily …
##  4 daily_units temperature_2m_min         "deg C"    Maximum and minimum daily …
##  5 daily_units apparent_temperature_max   "deg C"    Maximum and minimum daily …
##  6 daily_units apparent_temperature_min   "deg C"    Maximum and minimum daily …
##  7 daily_units precipitation_sum          "mm"       Sum of daily precipitation…
##  8 daily_units rain_sum                   "mm"       Sum of daily rain          
##  9 daily_units snowfall_sum               "cm"       Sum of daily snowfall      
## 10 daily_units precipitation_hours        "h"        The number of hours with r…
## 11 daily_units sunrise                    "iso8601"  Sun rise and set times     
## 12 daily_units sunset                     "iso8601"  Sun rise and set times     
## 13 daily_units windspeed_10m_max          "km/h"     Maximum wind speed and gus…
## 14 daily_units windgusts_10m_max          "km/h"     Maximum wind speed and gus…
## 15 daily_units winddirection_10m_dominant "deg "     Dominant wind direction    
## 16 daily_units shortwave_radiation_sum    "MJ/m²"    The sum of solar radiaion …
## 17 daily_units et0_fao_evapotranspiration "mm"       Daily sum of ET0 Reference…
## 
## $tblDescription
## # A tibble: 1 × 7
##   latitude longitude generationtime_ms utc_offset_seconds timezone  
##      <dbl>     <dbl>             <dbl>              <int> <chr>     
## 1     30.1     -89.9             7165.             -18000 US/Central
## # ℹ 2 more variables: timezone_abbreviation <chr>, elevation <dbl>
## 
## 
## latitude: 30.05272
## longitude: -89.89499
## generationtime_ms: 7164.689
## utc_offset_seconds: -18000
## timezone: US/Central
## timezone_abbreviation: GMT-5
## elevation: 0
# Sample records of tibble
msyOMDaily$tblDaily %>% glimpse()
## Rows: 23,376
## Columns: 18
## $ date                       <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time                       <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode                <int> 61, 63, 51, 3, 51, 63, 51, 3, 3, 51, 3, 3, …
## $ temperature_2m_max         <dbl> 15.5, 19.1, 17.2, 12.1, 17.9, 18.2, 10.3, 1…
## $ temperature_2m_min         <dbl> 10.9, 12.3, 8.6, 6.4, 11.3, 10.3, 6.1, 4.6,…
## $ apparent_temperature_max   <dbl> 13.6, 20.8, 15.8, 9.2, 20.1, 20.2, 7.7, 9.1…
## $ apparent_temperature_min   <dbl> 6.6, 9.8, 4.1, 2.7, 8.0, 7.3, 2.5, 1.7, 4.8…
## $ precipitation_sum          <dbl> 7.1, 18.5, 0.3, 0.0, 0.7, 26.0, 0.1, 0.0, 0…
## $ rain_sum                   <dbl> 7.1, 18.5, 0.3, 0.0, 0.7, 26.0, 0.1, 0.0, 0…
## $ snowfall_sum               <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ precipitation_hours        <dbl> 12, 12, 2, 0, 7, 19, 1, 0, 0, 1, 0, 0, 0, 8…
## $ sunrise                    <chr> "1960-01-01T07:55", "1960-01-02T07:55", "19…
## $ sunset                     <chr> "1960-01-01T18:10", "1960-01-02T18:11", "19…
## $ windspeed_10m_max          <dbl> 28.9, 18.8, 29.8, 16.2, 21.0, 19.7, 22.7, 1…
## $ windgusts_10m_max          <dbl> 53.6, 36.4, 50.0, 27.7, 34.6, 59.4, 39.2, 2…
## $ winddirection_10m_dominant <int> 69, 125, 345, 40, 81, 352, 324, 66, 87, 106…
## $ shortwave_radiation_sum    <dbl> 7.86, 4.91, 13.14, 13.63, 4.33, 1.05, 10.67…
## $ et0_fao_evapotranspiration <dbl> 1.05, 0.79, 2.22, 2.17, 1.14, 0.28, 1.30, 1…

Variables are converted to proper data type:

dfDailyMSY <- msyOMDaily$tblDaily %>%
    mutate(weathercode=factor(weathercode), 
           sunrise_chr=sunrise, 
           sunset_chr=sunset,
           sunrise=lubridate::ymd_hm(sunrise), 
           sunset=lubridate::ymd_hm(sunset), 
           fct_winddir=factor(winddirection_10m_dominant)
           )
glimpse(dfDailyMSY)
## Rows: 23,376
## Columns: 21
## $ date                       <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time                       <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode                <fct> 61, 63, 51, 3, 51, 63, 51, 3, 3, 51, 3, 3, …
## $ temperature_2m_max         <dbl> 15.5, 19.1, 17.2, 12.1, 17.9, 18.2, 10.3, 1…
## $ temperature_2m_min         <dbl> 10.9, 12.3, 8.6, 6.4, 11.3, 10.3, 6.1, 4.6,…
## $ apparent_temperature_max   <dbl> 13.6, 20.8, 15.8, 9.2, 20.1, 20.2, 7.7, 9.1…
## $ apparent_temperature_min   <dbl> 6.6, 9.8, 4.1, 2.7, 8.0, 7.3, 2.5, 1.7, 4.8…
## $ precipitation_sum          <dbl> 7.1, 18.5, 0.3, 0.0, 0.7, 26.0, 0.1, 0.0, 0…
## $ rain_sum                   <dbl> 7.1, 18.5, 0.3, 0.0, 0.7, 26.0, 0.1, 0.0, 0…
## $ snowfall_sum               <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ precipitation_hours        <dbl> 12, 12, 2, 0, 7, 19, 1, 0, 0, 1, 0, 0, 0, 8…
## $ sunrise                    <dttm> 1960-01-01 07:55:00, 1960-01-02 07:55:00, …
## $ sunset                     <dttm> 1960-01-01 18:10:00, 1960-01-02 18:11:00, …
## $ windspeed_10m_max          <dbl> 28.9, 18.8, 29.8, 16.2, 21.0, 19.7, 22.7, 1…
## $ windgusts_10m_max          <dbl> 53.6, 36.4, 50.0, 27.7, 34.6, 59.4, 39.2, 2…
## $ winddirection_10m_dominant <int> 69, 125, 345, 40, 81, 352, 324, 66, 87, 106…
## $ shortwave_radiation_sum    <dbl> 7.86, 4.91, 13.14, 13.63, 4.33, 1.05, 10.67…
## $ et0_fao_evapotranspiration <dbl> 1.05, 0.79, 2.22, 2.17, 1.14, 0.28, 1.30, 1…
## $ sunrise_chr                <chr> "1960-01-01T07:55", "1960-01-02T07:55", "19…
## $ sunset_chr                 <chr> "1960-01-01T18:10", "1960-01-02T18:11", "19…
## $ fct_winddir                <fct> 69, 125, 345, 40, 81, 352, 324, 66, 87, 106…

Averages by month for select continuous variables are plotted:

tmpMapNames <- c("precipitation_sum"="3. Precipitation (mm)", 
                 "windspeed_10m_max"="4. Windspeed (kph)", 
                 "temperature_2m_max"="1. High Temperature (C)",
                 "temperature_2m_min"="2. Low Temperature (C)"
                 )

dfDailyMSY %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(year, month, temperature_2m_max, temperature_2m_min, precipitation_sum, windspeed_10m_max) %>%
    group_by(year, month) %>%
    summarize(across(-c(precipitation_sum), .fns=mean), 
              across(c(precipitation_sum), .fns=sum), 
              .groups="drop"
              ) %>%
    group_by(month) %>%
    summarize(across(-c(year), .fns=mean)) %>%
    pivot_longer(-c(month)) %>%
    ggplot(aes(x=month, y=value)) + 
    geom_line(aes(group=1)) + 
    facet_wrap(~tmpMapNames[name], scales="free_y") + 
    labs(x=NULL, y=NULL, title="Monthly averages for key metrics (New Orleans 1960-2023)")

Averages by month for select categorical variables are plotted:

tmpMapNames <- c("wc"="1. Weather Type", 
                 "winddir"="2. Predominant Wind Direction"
                 )

tmpDFPlot <- dfDailyMSY %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), 
           year=year(date), 
           winddir=case_when(winddirection_10m_dominant>360~"Invalid", 
                             winddirection_10m_dominant>=315~"1. N", 
                             winddirection_10m_dominant>=225~"2. W", 
                             winddirection_10m_dominant>=135~"3. S", 
                             winddirection_10m_dominant>=45~"4. E", 
                             winddirection_10m_dominant>=0~"1. N", 
                             TRUE~"Invalid"
                             ), 
           wc=case_when(weathercode==0~"1. Clear", 
                        weathercode %in% c(1, 2, 3)~"2. Dry", 
                        weathercode %in% c(51, 53, 55)~"3. Drizzle", 
                        weathercode %in% c(61, 63, 65)~"4. Rain", 
                        weathercode %in% c(71, 73, 75)~"5. Snow", 
                        TRUE~"Error"
                        )
           ) %>%
    select(month, wc, winddir) %>%
    pivot_longer(-c(month)) %>%
    count(month, name, value)

tmpPlotFN <- function(x) {
    p1 <- tmpDFPlot %>%
        filter(name==x) %>%
        ggplot(aes(x=month, y=n)) + 
        geom_line(aes(group=value, color=value), lwd=2) + 
        labs(x=NULL, 
             y=NULL, 
             title=paste0(tmpMapNames[x], " (New Orleans 1960-2023)")
             ) + 
        scale_color_discrete(NULL) + 
        theme(legend.position = "bottom")
    return(p1)
    }

gridExtra::grid.arrange(tmpPlotFN("wc"), tmpPlotFN("winddir"), nrow=1)

Averages by year for select continuous variables are plotted:

tmpMapNames <- c("precipitation_sum"="3. Precipitation (mm)", 
                 "windspeed_10m_max"="4. Windspeed (kph)", 
                 "temperature_2m_max"="1. High Temperature (C)",
                 "temperature_2m_min"="2. Low Temperature (C)"
                 )

dfDailyMSY %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(year, temperature_2m_max, temperature_2m_min, precipitation_sum, windspeed_10m_max) %>%
    group_by(year) %>%
    summarize(across(-c(precipitation_sum), .fns=mean), 
              across(c(precipitation_sum), .fns=sum), 
              .groups="drop"
              ) %>%
    pivot_longer(-c(year)) %>%
    ggplot(aes(x=year, y=value)) + 
    geom_line(aes(group=1)) + 
    facet_wrap(~tmpMapNames[name], scales="free_y") + 
    labs(x=NULL, y=NULL, title="Annual averages for key metrics (New Orleans 1960-2023)")

Averages by year for select categorical variables are plotted:

tmpMapNames <- c("wc"="1. Weather Type", 
                 "winddir"="2. Predominant Wind Direction"
                 )

tmpDFPlot <- dfDailyMSY %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), 
           year=year(date), 
           winddir=case_when(winddirection_10m_dominant>360~"Invalid", 
                             winddirection_10m_dominant>=315~"1. N", 
                             winddirection_10m_dominant>=225~"2. W", 
                             winddirection_10m_dominant>=135~"3. S", 
                             winddirection_10m_dominant>=45~"4. E", 
                             winddirection_10m_dominant>=0~"1. N", 
                             TRUE~"Invalid"
                             ), 
           wc=case_when(weathercode==0~"1. Clear", 
                        weathercode %in% c(1, 2, 3)~"2. Dry", 
                        weathercode %in% c(51, 53, 55)~"3. Drizzle", 
                        weathercode %in% c(61, 63, 65)~"4. Rain", 
                        weathercode %in% c(71, 73, 75)~"5. Snow", 
                        TRUE~"Error"
                        )
           ) %>%
    select(year, wc, winddir) %>%
    pivot_longer(-c(year)) %>%
    count(year, name, value)

tmpPlotFN <- function(x) {
    p1 <- tmpDFPlot %>%
        filter(name==x) %>%
        ggplot(aes(x=year, y=n)) + 
        geom_line(aes(group=value, color=value), lwd=1) + 
        labs(x=NULL, 
             y=NULL, 
             title=paste0(tmpMapNames[x], " (New Orleans 1960-2023)")
             ) + 
        scale_color_discrete(NULL) + 
        theme(legend.position = "bottom")
    return(p1)
    }

gridExtra::grid.arrange(tmpPlotFN("wc"), tmpPlotFN("winddir"), nrow=1)

Boxplots for maximum windspeed are created:

dfDailyMSY %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(month, windspeed_10m_max) %>%
    ggplot(aes(x=month, y=windspeed_10m_max)) + 
    geom_boxplot(fill="lightblue") + 
    lims(y=c(0, NA)) +
    labs(x=NULL, 
         y="Maximum daily wind speed (kph)", 
         title="Maximum daily windspeed by month (New Orleans 1960-2023)"
         )

Boxplots for precipitation are created:

dfDailyMSY %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(month, precipitation_hours) %>%
    ggplot(aes(x=month, y=precipitation_hours)) + 
    geom_boxplot(fill="lightblue") + 
    lims(y=c(0, NA)) +
    labs(x=NULL, 
         y="Daily precipitation hours", 
         title="Daily precipitation hours by month (New Orleans 1960-2023)"
         )

dfDailyMSY %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(month, precipitation_sum) %>%
    ggplot(aes(x=month, y=precipitation_sum)) + 
    geom_boxplot(fill="lightblue") + 
    lims(y=c(0, NA)) +
    labs(x=NULL, 
         y="Daily precipitation (mm)", 
         title="Daily precipitation by month (New Orleans 1960-2023)"
         )

Boxplots for temperature are created:

dfDailyMSY %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(month, temperature_2m_max) %>%
    ggplot(aes(x=month, y=temperature_2m_max)) + 
    geom_boxplot(fill="lightblue") + 
    labs(x=NULL, 
         y="Daily high temperature (C)", 
         title="Daily high temperature by month (New Orleans 1960-2023)"
         )

dfDailyMSY %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(month, temperature_2m_min) %>%
    ggplot(aes(x=month, y=temperature_2m_min)) + 
    geom_boxplot(fill="lightblue") + 
    labs(x=NULL, 
         y="Daily low temperature (C)", 
         title="Daily low temperature by month (New Orleans 1960-2023)"
         )

ACF and PACF are explored for maximum temperature:

acfTemp <- acf(dfDailyMSY$temperature_2m_max, lag.max=1000)

as.vector(acfTemp$acf) %>% findPeaks(width=21) %>% which()
## [1] 371 736
as.vector(acfTemp$acf) %>% findPeaks(width=21, FUN=min) %>% which()
## [1] 185 546 917
pacfTemp <- pacf(dfDailyMSY$temperature_2m_max, lag.max=1000)

pacf(dfDailyMSY$temperature_2m_max, lag.max=50)

As expected, ACF has a sustained seasonal pattern, with peaks (at roughly intervals of 365) and valleys (at roughly intervals of 365 offset by roughly 185) corresponding to the 365-day year

ACF and PACF are explored for precipitation:

acfPrecip <- acf(dfDailyMSY$precipitation_sum, lag.max=1000)

acf(dfDailyMSY$precipitation_sum, lag.max=50)

as.vector(acfPrecip$acf) %>% findPeaks(width=21) %>% which()
##  [1]  18  70  99 113 127 139 155 179 203 227 241 264 327 362 382 403 433 452 465
## [20] 486 530 544 565 577 628 641 670 690 703 721 737 761 792 817 842 854 880 892
## [39] 909 937 967 989
as.vector(acfPrecip$acf) %>% findPeaks(width=21, FUN=min) %>% which()
##  [1]  22  50  64  82 104 131 150 164 176 196 223 247 261 290 307 329 345 376 388
## [20] 412 428 439 456 474 492 526 552 568 580 592 625 636 659 680 700 725 741 757
## [39] 775 797 809 830 846 872 885 906 918 944 971
pacfPrecip <- pacf(dfDailyMSY$precipitation_sum, lag.max=1000)

pacf(dfDailyMSY$precipitation_sum, lag.max=50)

Precipitation by contrast has little seasonality and mostly just a slight correlation to the previous day’s value

ACF and PACF are explored for wind speed:

acfWind <- acf(dfDailyMSY$windspeed_10m_max, lag.max=1000)

acf(dfDailyMSY$windspeed_10m_max, lag.max=50)

as.vector(acfWind$acf) %>% findPeaks(width=21) %>% which()
##  [1] 170 207 227 333 353 542 579 713 725 741 767 893 920 938
as.vector(acfWind$acf) %>% findPeaks(width=21, FUN=min) %>% which()
##  [1] 164 191 365 532 549 583 719 734 869 911 922
pacfWind <- pacf(dfDailyMSY$windspeed_10m_max, lag.max=1000)

pacf(dfDailyMSY$windspeed_10m_max, lag.max=50)

Similar to temperature, wind speed appears to have a sustained seasonal component, though peak correlations are much lower in magnitude (~0.2 for wind speed vs. ~0.8 for temperature)

A boxplot for delta daily temperature is created:

dfDailyMSY %>%
    mutate(month=factor(month.abb[month(date)], levels=month.abb), 
           chg=temperature_2m_max-lag(temperature_2m_max)
           ) %>%
    filter(!is.na(chg)) %>%
    ggplot(aes(x=month, y=chg)) + 
    geom_boxplot() + 
    labs(x=NULL, y="Daily change in maximum temperature")

Daily temperature changes are generally larger in magnitude during the colder season

A boxplot for delta daily wind speed is created:

dfDailyMSY %>%
    mutate(month=factor(month.abb[month(date)], levels=month.abb), 
           chg=windspeed_10m_max-lag(windspeed_10m_max)
           ) %>%
    filter(!is.na(chg)) %>%
    ggplot(aes(x=month, y=chg)) + 
    geom_boxplot() + 
    labs(x=NULL, y="Daily change in maximum wind speed")

Daily wind speed changes are generally similar in magnitude by season, though with more outliers in Aug-Oct

A boxplot for delta daily precipitation is created:

dfDailyMSY %>%
    mutate(month=factor(month.abb[month(date)], levels=month.abb), 
           chg=precipitation_sum-lag(precipitation_sum)
           ) %>%
    filter(!is.na(chg)) %>%
    ggplot(aes(x=month, y=chg)) + 
    geom_boxplot() + 
    labs(x=NULL, y="Daily change in precipitation")

Most days have no precipitation. The corresponding boxplot for precipitation change has small boxes and whiskers with many outliers

Averages for temperature, precipitation, and wind speed are calculated by day of year, using a 21-day rolling window:

df_r21 <- dfDailyMSY %>%
    mutate(doy=pmin(yday(date), 365)) %>% 
    helperRollingAgg(origVar="temperature_2m_max", newName="temp_r21", k=21) %>% 
    helperRollingAgg(origVar="windspeed_10m_max", newName="wind_r21", k=21) %>% 
    helperRollingAgg(origVar="precipitation_sum", newName="precip_r21", k=21) %>%
    select(date, doy, contains("_r21"))

df_r21 %>%
    na.omit() %>%
    group_by(doy) %>%
    summarize(across(where(is.numeric), .fns=mean), n=n()) %>%
    pivot_longer(cols=-c(doy)) %>%
    ggplot(aes(x=doy, y=value)) + 
    geom_line(aes(group=name, color=name)) + 
    facet_wrap(~name, scales="free_y") +
    labs(x="Day of Year", y="Rolling-21 mean") + 
    theme(legend.position="none")

Outliers are likely much more important in driving average precipitation than in driving average temperature

Standard deviations for temperature, precipitation, and wind speed are calculated by day of year, using a 21-day rolling window:

df_r21_sd <- dfDailyMSY %>%
    mutate(doy=pmin(yday(date), 365)) %>% 
    group_by(doy) %>%
    mutate(across(.cols=c(temperature_2m_max, windspeed_10m_max, precipitation_sum), .fns=sd)) %>%
    ungroup() %>%
    helperRollingAgg(origVar="temperature_2m_max", newName="temp_r21", k=21) %>% 
    helperRollingAgg(origVar="windspeed_10m_max", newName="wind_r21", k=21) %>% 
    helperRollingAgg(origVar="precipitation_sum", newName="precip_r21", k=21) %>%
    select(date, doy, contains("_r21"))

df_r21_sd %>%
    na.omit() %>%
    group_by(doy) %>%
    summarize(across(where(is.numeric), .fns=mean)) %>%
    pivot_longer(cols=-c(doy)) %>%
    ggplot(aes(x=doy, y=value)) + 
    geom_line(aes(group=name, color=name)) + 
    facet_wrap(~name, scales="free_y") +
    labs(x="Day of Year", y="Rolling-21 mean of daily standard deviation") + 
    theme(legend.position="none")

Means and standard deviations are plotted together:

df_r21 %>%
    bind_rows(df_r21_sd, .id="src") %>%
    na.omit() %>%
    mutate(musig=c("1"="Mean", "2"="SD")[src]) %>%
    group_by(doy, musig) %>%
    summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
    pivot_longer(cols=-c(doy, musig)) %>%
    pivot_wider(id_cols=c(doy, name), names_from="musig") %>%
    ggplot(aes(x=doy)) + 
    geom_line(aes(y=Mean, group=name, color=name), lwd=2) + 
    geom_ribbon(aes(ymin=Mean-SD, ymax=Mean+SD, fill=name), alpha=0.5) +
    facet_wrap(~name, scales="free_y") +
    labs(x="Day of Year", 
         y="Rolling-21 mean +/- daily standard deviation", 
         title="Rolling 21-day mean +/- one rolling 21-day sd"
         ) + 
    theme(legend.position="none")

Means and approximate SEM are plotted together:

nYears <- length(unique(year(df_r21$date)))
nYears
## [1] 64
df_r21 %>%
    bind_rows(df_r21_sd, .id="src") %>%
    na.omit() %>%
    mutate(musig=c("1"="Mean", "2"="SD")[src]) %>%
    group_by(doy, musig) %>%
    summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
    pivot_longer(cols=-c(doy, musig)) %>%
    pivot_wider(id_cols=c(doy, name), names_from="musig") %>%
    ggplot(aes(x=doy)) + 
    geom_line(aes(y=Mean, group=name, color=name), lwd=2) + 
    geom_ribbon(aes(ymin=Mean-SD/sqrt(nYears-1), ymax=Mean+SD/sqrt(nYears-1), fill=name), alpha=0.5) +
    facet_wrap(~name, scales="free_y") +
    labs(x="Day of Year", 
         y="Rolling-21 mean +/- 1 SEM (approx)", 
         title="Rolling 21-day mean +/- 1 SEM (approx)"
         ) + 
    theme(legend.position="none")

Consistent with previous analyses, temperature has a strong seasonal pattern with differences in mean that vastly exceed the standard error of the mean. By contrast, while there is apparent seasonal spikiness to precipitation, rolling 21-day means are usually within one standard error of the mean of each other. Wind is more similar to temperature in having seasonal variations that meaningfully exceed SEM in magnitude.

Long term daily data is downloaded for Denver:

# Daily data download for Denver, CO
testURLDaily <- helperOpenMeteoURL(cityName="Denver CO", 
                                   dailyIndices=1:nrow(tblMetricsDaily),
                                   startDate="1960-01-01", 
                                   endDate="2023-12-31", 
                                   tz="US/Mountain"
                                   )
## 
## Daily metrics created from indices: weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration
testURLDaily
## [1] "https://archive-api.open-meteo.com/v1/archive?latitude=39.77&longitude=-104.87&start_date=1960-01-01&end_date=2023-12-31&daily=weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration&timezone=US%2FMountain"
# Download file
if(!file.exists("testOM_daily_den.json")) {
    fileDownload(fileName="testOM_daily_den.json", url=testURLDaily)
} else {
    cat("\nFile testOM_daily_den.json already exists, skipping download\n")
}
## 
## File testOM_daily_den.json already exists, skipping download

The daily dataset is loaded:

# Read daily JSON file
denOMDaily <- formatOpenMeteoJSON("testOM_daily_den.json")
## 
## Objects in JSON include: latitude, longitude, generationtime_ms, utc_offset_seconds, timezone, timezone_abbreviation, elevation, daily_units, daily 
## 
## $tblDaily
## # A tibble: 23,376 × 18
##    date       time       weathercode temperature_2m_max temperature_2m_min
##    <date>     <chr>            <int>              <dbl>              <dbl>
##  1 1960-01-01 1960-01-01          75               -4.5              -11  
##  2 1960-01-02 1960-01-02           3               -5.7              -19.2
##  3 1960-01-03 1960-01-03           1               -7.4              -22.5
##  4 1960-01-04 1960-01-04           3               -1.9              -22.8
##  5 1960-01-05 1960-01-05           1               -1.5              -20.8
##  6 1960-01-06 1960-01-06           3                3.4              -16  
##  7 1960-01-07 1960-01-07           1                5.4              -12.7
##  8 1960-01-08 1960-01-08           3                9.6               -9.1
##  9 1960-01-09 1960-01-09           3               11.7               -5.6
## 10 1960-01-10 1960-01-10           3                9.2               -4.1
## # ℹ 23,366 more rows
## # ℹ 13 more variables: apparent_temperature_max <dbl>,
## #   apparent_temperature_min <dbl>, precipitation_sum <dbl>, rain_sum <dbl>,
## #   snowfall_sum <dbl>, precipitation_hours <dbl>, sunrise <chr>, sunset <chr>,
## #   windspeed_10m_max <dbl>, windgusts_10m_max <dbl>,
## #   winddirection_10m_dominant <int>, shortwave_radiation_sum <dbl>,
## #   et0_fao_evapotranspiration <dbl>
## 
## $tblHourly
## NULL
## 
## $tblUnits
## # A tibble: 17 × 4
##    metricType  name                       value      description                
##    <chr>       <chr>                      <chr>      <chr>                      
##  1 daily_units time                       "iso8601"  <NA>                       
##  2 daily_units weathercode                "wmo code" The most severe weather co…
##  3 daily_units temperature_2m_max         "deg C"    Maximum and minimum daily …
##  4 daily_units temperature_2m_min         "deg C"    Maximum and minimum daily …
##  5 daily_units apparent_temperature_max   "deg C"    Maximum and minimum daily …
##  6 daily_units apparent_temperature_min   "deg C"    Maximum and minimum daily …
##  7 daily_units precipitation_sum          "mm"       Sum of daily precipitation…
##  8 daily_units rain_sum                   "mm"       Sum of daily rain          
##  9 daily_units snowfall_sum               "cm"       Sum of daily snowfall      
## 10 daily_units precipitation_hours        "h"        The number of hours with r…
## 11 daily_units sunrise                    "iso8601"  Sun rise and set times     
## 12 daily_units sunset                     "iso8601"  Sun rise and set times     
## 13 daily_units windspeed_10m_max          "km/h"     Maximum wind speed and gus…
## 14 daily_units windgusts_10m_max          "km/h"     Maximum wind speed and gus…
## 15 daily_units winddirection_10m_dominant "deg "     Dominant wind direction    
## 16 daily_units shortwave_radiation_sum    "MJ/m²"    The sum of solar radiaion …
## 17 daily_units et0_fao_evapotranspiration "mm"       Daily sum of ET0 Reference…
## 
## $tblDescription
## # A tibble: 1 × 7
##   latitude longitude generationtime_ms utc_offset_seconds timezone   
##      <dbl>     <dbl>             <dbl>              <int> <chr>      
## 1     39.8     -105.             8310.             -21600 US/Mountain
## # ℹ 2 more variables: timezone_abbreviation <chr>, elevation <dbl>
## 
## 
## latitude: 39.75395
## longitude: -104.8957
## generationtime_ms: 8310.172
## utc_offset_seconds: -21600
## timezone: US/Mountain
## timezone_abbreviation: GMT-6
## elevation: 1608
# Sample records of tibble
denOMDaily$tblDaily %>% glimpse()
## Rows: 23,376
## Columns: 18
## $ date                       <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time                       <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode                <int> 75, 3, 1, 3, 1, 3, 1, 3, 3, 3, 3, 51, 73, 7…
## $ temperature_2m_max         <dbl> -4.5, -5.7, -7.4, -1.9, -1.5, 3.4, 5.4, 9.6…
## $ temperature_2m_min         <dbl> -11.0, -19.2, -22.5, -22.8, -20.8, -16.0, -…
## $ apparent_temperature_max   <dbl> -8.9, -10.0, -11.9, -6.2, -5.5, -0.3, 0.5, …
## $ apparent_temperature_min   <dbl> -15.8, -24.5, -28.3, -28.8, -26.4, -21.4, -…
## $ precipitation_sum          <dbl> 8.4, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0…
## $ rain_sum                   <dbl> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0…
## $ snowfall_sum               <dbl> 5.88, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
## $ precipitation_hours        <dbl> 13, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 3, 17,…
## $ sunrise                    <chr> "1960-01-01T08:20", "1960-01-02T08:20", "19…
## $ sunset                     <chr> "1960-01-01T17:45", "1960-01-02T17:46", "19…
## $ windspeed_10m_max          <dbl> 22.1, 9.8, 13.3, 14.5, 14.3, 13.4, 16.1, 17…
## $ windgusts_10m_max          <dbl> 36.0, 15.8, 21.2, 23.4, 21.2, 22.3, 25.9, 2…
## $ winddirection_10m_dominant <int> 15, 156, 154, 149, 175, 196, 166, 182, 188,…
## $ shortwave_radiation_sum    <dbl> 6.21, 10.29, 10.29, 11.02, 10.95, 10.69, 10…
## $ et0_fao_evapotranspiration <dbl> 0.51, 0.73, 0.71, 0.89, 0.90, 1.17, 1.26, 1…

Variables are converted to proper data type:

dfDailyDEN <- denOMDaily$tblDaily %>%
    mutate(weathercode=factor(weathercode), 
           sunrise_chr=sunrise, 
           sunset_chr=sunset,
           sunrise=lubridate::ymd_hm(sunrise), 
           sunset=lubridate::ymd_hm(sunset), 
           fct_winddir=factor(winddirection_10m_dominant)
           )
glimpse(dfDailyDEN)
## Rows: 23,376
## Columns: 21
## $ date                       <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time                       <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode                <fct> 75, 3, 1, 3, 1, 3, 1, 3, 3, 3, 3, 51, 73, 7…
## $ temperature_2m_max         <dbl> -4.5, -5.7, -7.4, -1.9, -1.5, 3.4, 5.4, 9.6…
## $ temperature_2m_min         <dbl> -11.0, -19.2, -22.5, -22.8, -20.8, -16.0, -…
## $ apparent_temperature_max   <dbl> -8.9, -10.0, -11.9, -6.2, -5.5, -0.3, 0.5, …
## $ apparent_temperature_min   <dbl> -15.8, -24.5, -28.3, -28.8, -26.4, -21.4, -…
## $ precipitation_sum          <dbl> 8.4, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0…
## $ rain_sum                   <dbl> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0…
## $ snowfall_sum               <dbl> 5.88, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
## $ precipitation_hours        <dbl> 13, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 3, 17,…
## $ sunrise                    <dttm> 1960-01-01 08:20:00, 1960-01-02 08:20:00, …
## $ sunset                     <dttm> 1960-01-01 17:45:00, 1960-01-02 17:46:00, …
## $ windspeed_10m_max          <dbl> 22.1, 9.8, 13.3, 14.5, 14.3, 13.4, 16.1, 17…
## $ windgusts_10m_max          <dbl> 36.0, 15.8, 21.2, 23.4, 21.2, 22.3, 25.9, 2…
## $ winddirection_10m_dominant <int> 15, 156, 154, 149, 175, 196, 166, 182, 188,…
## $ shortwave_radiation_sum    <dbl> 6.21, 10.29, 10.29, 11.02, 10.95, 10.69, 10…
## $ et0_fao_evapotranspiration <dbl> 0.51, 0.73, 0.71, 0.89, 0.90, 1.17, 1.26, 1…
## $ sunrise_chr                <chr> "1960-01-01T08:20", "1960-01-02T08:20", "19…
## $ sunset_chr                 <chr> "1960-01-01T17:45", "1960-01-02T17:46", "19…
## $ fct_winddir                <fct> 15, 156, 154, 149, 175, 196, 166, 182, 188,…

Averages by month for select continuous variables are plotted:

tmpMapNames <- c("precipitation_sum"="3. Precipitation (mm)", 
                 "windspeed_10m_max"="4. Windspeed (kph)", 
                 "temperature_2m_max"="1. High Temperature (C)",
                 "temperature_2m_min"="2. Low Temperature (C)"
                 )

dfDailyDEN %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(year, month, temperature_2m_max, temperature_2m_min, precipitation_sum, windspeed_10m_max) %>%
    group_by(year, month) %>%
    summarize(across(-c(precipitation_sum), .fns=mean), 
              across(c(precipitation_sum), .fns=sum), 
              .groups="drop"
              ) %>%
    group_by(month) %>%
    summarize(across(-c(year), .fns=mean)) %>%
    pivot_longer(-c(month)) %>%
    ggplot(aes(x=month, y=value)) + 
    geom_line(aes(group=1)) + 
    facet_wrap(~tmpMapNames[name], scales="free_y") + 
    labs(x=NULL, y=NULL, title="Monthly averages for key metrics (Denver 1960-2023)")

Averages by month for select categorical variables are plotted:

tmpMapNames <- c("wc"="1. Weather Type", 
                 "winddir"="2. Predominant Wind Direction"
                 )

tmpDFPlot <- dfDailyDEN %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), 
           year=year(date), 
           winddir=case_when(winddirection_10m_dominant>360~"Invalid", 
                             winddirection_10m_dominant>=315~"1. N", 
                             winddirection_10m_dominant>=225~"2. W", 
                             winddirection_10m_dominant>=135~"3. S", 
                             winddirection_10m_dominant>=45~"4. E", 
                             winddirection_10m_dominant>=0~"1. N", 
                             TRUE~"Invalid"
                             ), 
           wc=case_when(weathercode==0~"1. Clear", 
                        weathercode %in% c(1, 2, 3)~"2. Dry", 
                        weathercode %in% c(51, 53, 55)~"3. Drizzle", 
                        weathercode %in% c(61, 63, 65)~"4. Rain", 
                        weathercode %in% c(71, 73, 75)~"5. Snow", 
                        TRUE~"Error"
                        )
           ) %>%
    select(month, wc, winddir) %>%
    pivot_longer(-c(month)) %>%
    count(month, name, value)

tmpPlotFN <- function(x) {
    p1 <- tmpDFPlot %>%
        filter(name==x) %>%
        ggplot(aes(x=month, y=n)) + 
        geom_line(aes(group=value, color=value), lwd=2) + 
        labs(x=NULL, 
             y=NULL, 
             title=paste0(tmpMapNames[x], " (Denver 1960-2023)")
             ) + 
        scale_color_discrete(NULL) + 
        theme(legend.position = "bottom")
    return(p1)
    }

gridExtra::grid.arrange(tmpPlotFN("wc"), tmpPlotFN("winddir"), nrow=1)

Averages by year for select continuous variables are plotted:

tmpMapNames <- c("precipitation_sum"="3. Precipitation (mm)", 
                 "windspeed_10m_max"="4. Windspeed (kph)", 
                 "temperature_2m_max"="1. High Temperature (C)",
                 "temperature_2m_min"="2. Low Temperature (C)"
                 )

dfDailyDEN %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(year, temperature_2m_max, temperature_2m_min, precipitation_sum, windspeed_10m_max) %>%
    group_by(year) %>%
    summarize(across(-c(precipitation_sum), .fns=mean), 
              across(c(precipitation_sum), .fns=sum), 
              .groups="drop"
              ) %>%
    pivot_longer(-c(year)) %>%
    ggplot(aes(x=year, y=value)) + 
    geom_line(aes(group=1)) + 
    facet_wrap(~tmpMapNames[name], scales="free_y") + 
    labs(x=NULL, y=NULL, title="Annual averages for key metrics (Denver 1960-2023)")

Averages by year for select categorical variables are plotted:

tmpMapNames <- c("wc"="1. Weather Type", 
                 "winddir"="2. Predominant Wind Direction"
                 )

tmpDFPlot <- dfDailyDEN %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), 
           year=year(date), 
           winddir=case_when(winddirection_10m_dominant>360~"Invalid", 
                             winddirection_10m_dominant>=315~"1. N", 
                             winddirection_10m_dominant>=225~"2. W", 
                             winddirection_10m_dominant>=135~"3. S", 
                             winddirection_10m_dominant>=45~"4. E", 
                             winddirection_10m_dominant>=0~"1. N", 
                             TRUE~"Invalid"
                             ), 
           wc=case_when(weathercode==0~"1. Clear", 
                        weathercode %in% c(1, 2, 3)~"2. Dry", 
                        weathercode %in% c(51, 53, 55)~"3. Drizzle", 
                        weathercode %in% c(61, 63, 65)~"4. Rain", 
                        weathercode %in% c(71, 73, 75)~"5. Snow", 
                        TRUE~"Error"
                        )
           ) %>%
    select(year, wc, winddir) %>%
    pivot_longer(-c(year)) %>%
    count(year, name, value)

tmpPlotFN <- function(x) {
    p1 <- tmpDFPlot %>%
        filter(name==x) %>%
        ggplot(aes(x=year, y=n)) + 
        geom_line(aes(group=value, color=value), lwd=1) + 
        labs(x=NULL, 
             y=NULL, 
             title=paste0(tmpMapNames[x], " (Denver 1960-2023)")
             ) + 
        scale_color_discrete(NULL) + 
        theme(legend.position = "bottom")
    return(p1)
    }

gridExtra::grid.arrange(tmpPlotFN("wc"), tmpPlotFN("winddir"), nrow=1)

Boxplots for maximum windspeed are created:

dfDailyDEN %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(month, windspeed_10m_max) %>%
    ggplot(aes(x=month, y=windspeed_10m_max)) + 
    geom_boxplot(fill="lightblue") + 
    lims(y=c(0, NA)) +
    labs(x=NULL, 
         y="Maximum daily wind speed (kph)", 
         title="Maximum daily windspeed by month (Denver 1960-2023)"
         )

Boxplots for precipitation are created:

dfDailyDEN %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(month, precipitation_hours) %>%
    ggplot(aes(x=month, y=precipitation_hours)) + 
    geom_boxplot(fill="lightblue") + 
    lims(y=c(0, NA)) +
    labs(x=NULL, 
         y="Daily precipitation hours", 
         title="Daily precipitation hours by month (Denver 1960-2023)"
         )

dfDailyDEN %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(month, precipitation_sum) %>%
    ggplot(aes(x=month, y=precipitation_sum)) + 
    geom_boxplot(fill="lightblue") + 
    lims(y=c(0, NA)) +
    labs(x=NULL, 
         y="Daily precipitation (mm)", 
         title="Daily precipitation by month (Denver 1960-2023)"
         )

Boxplots for snow are also created:

dfDailyDEN %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(month, snowfall_sum) %>%
    ggplot(aes(x=month, y=snowfall_sum)) + 
    geom_boxplot(fill="lightblue") + 
    lims(y=c(0, NA)) +
    labs(x=NULL, 
         y="Daily snowfall (liquid equivalent mm)", 
         title="Daily snowfall by month (Denver 1960-2023)"
         )

Boxplots for temperature are created:

dfDailyDEN %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(month, temperature_2m_max) %>%
    ggplot(aes(x=month, y=temperature_2m_max)) + 
    geom_boxplot(fill="lightblue") + 
    labs(x=NULL, 
         y="Daily high temperature (C)", 
         title="Daily high temperature by month (Denver 1960-2023)"
         )

dfDailyDEN %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(month, temperature_2m_min) %>%
    ggplot(aes(x=month, y=temperature_2m_min)) + 
    geom_boxplot(fill="lightblue") + 
    labs(x=NULL, 
         y="Daily low temperature (C)", 
         title="Daily low temperature by month (Denver 1960-2023)"
         )

ACF and PACF are explored for maximum temperature:

acfTemp <- acf(dfDailyDEN$temperature_2m_max, lag.max=1000)

as.vector(acfTemp$acf) %>% findPeaks(width=21) %>% which()
## [1] 369 729
as.vector(acfTemp$acf) %>% findPeaks(width=21, FUN=min) %>% which()
## [1] 181 545 915
pacfTemp <- pacf(dfDailyDEN$temperature_2m_max, lag.max=1000)

pacf(dfDailyDEN$temperature_2m_max, lag.max=50)

As expected, ACF has a sustained seasonal pattern, with peaks (at roughly intervals of 365) and valleys (at roughly intervals of 365 offset by roughly 185) corresponding to the 365-day year

ACF and PACF are explored for precipitation:

acfPrecip <- acf(dfDailyDEN$precipitation_sum, lag.max=1000)

acf(dfDailyDEN$precipitation_sum, lag.max=50)

as.vector(acfPrecip$acf) %>% findPeaks(width=21) %>% which()
##  [1]  57  77  97 127 149 176 197 212 223 242 269 298 316 333 360 371 399 422 455
## [20] 471 498 535 562 580 591 606 629 652 673 687 708 735 754 785 800 819 830 859
## [39] 871 884 905 918 939 953 981
as.vector(acfPrecip$acf) %>% findPeaks(width=21, FUN=min) %>% which()
##  [1]  16  43  61  91 121 143 169 183 199 217 235 251 265 279 304 337 391 406 427
## [20] 443 463 514 544 575 595 613 627 666 703 730 751 767 798 816 840 866 897 920
## [39] 942 965 979 990
pacfPrecip <- pacf(dfDailyDEN$precipitation_sum, lag.max=1000)

pacf(dfDailyDEN$precipitation_sum, lag.max=50)

Precipitation by contrast has little seasonality and mostly just a slight correlation to the previous day’s value

ACF and PACF are explored for wind speed:

acfWind <- acf(dfDailyDEN$windspeed_10m_max, lag.max=1000)

acf(dfDailyDEN$windspeed_10m_max, lag.max=50)

as.vector(acfWind$acf) %>% findPeaks(width=21) %>% which()
##  [1]  46  68  89 111 148 170 197 215 237 282 302 342 373 407 454 476 488 509 535
## [20] 557 584 596 614 708 723 738 811 835 858 880 903 915 927 961
as.vector(acfWind$acf) %>% findPeaks(width=21, FUN=min) %>% which()
##  [1]  76  96 108 141 184 202 229 242 255 272 293 305 326 377 402 424 443 456 473
## [20] 503 515 544 582 602 627 688 713 730 762 776 803 832 870 885 909 921 951 966
pacfWind <- pacf(dfDailyDEN$windspeed_10m_max, lag.max=1000)

pacf(dfDailyDEN$windspeed_10m_max, lag.max=50)

Similar to temperature, wind speed appears to have a sustained seasonal component, though peak correlations are very low in magnitude (~0.05 for wind speed vs. ~0.75 for temperature)

A boxplot for delta daily temperature is created:

dfDailyDEN %>%
    mutate(month=factor(month.abb[month(date)], levels=month.abb), 
           chg=temperature_2m_max-lag(temperature_2m_max)
           ) %>%
    filter(!is.na(chg)) %>%
    ggplot(aes(x=month, y=chg)) + 
    geom_boxplot() + 
    labs(x=NULL, y="Daily change in maximum temperature")

Daily temperature changes are generally larger in magnitude during the colder season

A boxplot for delta daily wind speed is created:

dfDailyDEN %>%
    mutate(month=factor(month.abb[month(date)], levels=month.abb), 
           chg=windspeed_10m_max-lag(windspeed_10m_max)
           ) %>%
    filter(!is.na(chg)) %>%
    ggplot(aes(x=month, y=chg)) + 
    geom_boxplot() + 
    labs(x=NULL, y="Daily change in maximum wind speed")

Daily wind speed changes are generally similar in magnitude by season

A boxplot for delta daily precipitation is created:

dfDailyDEN %>%
    mutate(month=factor(month.abb[month(date)], levels=month.abb), 
           chg=precipitation_sum-lag(precipitation_sum)
           ) %>%
    filter(!is.na(chg)) %>%
    ggplot(aes(x=month, y=chg)) + 
    geom_boxplot() + 
    labs(x=NULL, y="Daily change in precipitation")

Most days have no precipitation. The corresponding boxplot for precipitation change has small boxes and whiskers with many outliers

Averages for temperature, precipitation, and wind speed are calculated by day of year, using a 21-day rolling window:

df_r21 <- dfDailyDEN %>%
    mutate(doy=pmin(yday(date), 365)) %>% 
    helperRollingAgg(origVar="temperature_2m_max", newName="temp_r21", k=21) %>% 
    helperRollingAgg(origVar="windspeed_10m_max", newName="wind_r21", k=21) %>% 
    helperRollingAgg(origVar="precipitation_sum", newName="precip_r21", k=21) %>%
    select(date, doy, contains("_r21"))

df_r21 %>%
    na.omit() %>%
    group_by(doy) %>%
    summarize(across(where(is.numeric), .fns=mean), n=n()) %>%
    pivot_longer(cols=-c(doy)) %>%
    ggplot(aes(x=doy, y=value)) + 
    geom_line(aes(group=name, color=name)) + 
    facet_wrap(~name, scales="free_y") +
    labs(x="Day of Year", y="Rolling-21 mean") + 
    theme(legend.position="none")

Outliers are likely much more important in driving average precipitation than in driving average temperature

Standard deviations for temperature, precipitation, and wind speed are calculated by day of year, using a 21-day rolling window:

df_r21_sd <- dfDailyDEN %>%
    mutate(doy=pmin(yday(date), 365)) %>% 
    group_by(doy) %>%
    mutate(across(.cols=c(temperature_2m_max, windspeed_10m_max, precipitation_sum), .fns=sd)) %>%
    ungroup() %>%
    helperRollingAgg(origVar="temperature_2m_max", newName="temp_r21", k=21) %>% 
    helperRollingAgg(origVar="windspeed_10m_max", newName="wind_r21", k=21) %>% 
    helperRollingAgg(origVar="precipitation_sum", newName="precip_r21", k=21) %>%
    select(date, doy, contains("_r21"))

df_r21_sd %>%
    na.omit() %>%
    group_by(doy) %>%
    summarize(across(where(is.numeric), .fns=mean)) %>%
    pivot_longer(cols=-c(doy)) %>%
    ggplot(aes(x=doy, y=value)) + 
    geom_line(aes(group=name, color=name)) + 
    facet_wrap(~name, scales="free_y") +
    labs(x="Day of Year", y="Rolling-21 mean of daily standard deviation") + 
    theme(legend.position="none")

Means and standard deviations are plotted together:

df_r21 %>%
    bind_rows(df_r21_sd, .id="src") %>%
    na.omit() %>%
    mutate(musig=c("1"="Mean", "2"="SD")[src]) %>%
    group_by(doy, musig) %>%
    summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
    pivot_longer(cols=-c(doy, musig)) %>%
    pivot_wider(id_cols=c(doy, name), names_from="musig") %>%
    ggplot(aes(x=doy)) + 
    geom_line(aes(y=Mean, group=name, color=name), lwd=2) + 
    geom_ribbon(aes(ymin=Mean-SD, ymax=Mean+SD, fill=name), alpha=0.5) +
    facet_wrap(~name, scales="free_y") +
    labs(x="Day of Year", 
         y="Rolling-21 mean +/- daily standard deviation", 
         title="Rolling 21-day mean +/- one rolling 21-day sd"
         ) + 
    theme(legend.position="none")

Means and approximate SEM are plotted together:

nYears <- length(unique(year(df_r21$date)))
nYears
## [1] 64
df_r21 %>%
    bind_rows(df_r21_sd, .id="src") %>%
    na.omit() %>%
    mutate(musig=c("1"="Mean", "2"="SD")[src]) %>%
    group_by(doy, musig) %>%
    summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
    pivot_longer(cols=-c(doy, musig)) %>%
    pivot_wider(id_cols=c(doy, name), names_from="musig") %>%
    ggplot(aes(x=doy)) + 
    geom_line(aes(y=Mean, group=name, color=name), lwd=2) + 
    geom_ribbon(aes(ymin=Mean-SD/sqrt(nYears-1), ymax=Mean+SD/sqrt(nYears-1), fill=name), alpha=0.5) +
    facet_wrap(~name, scales="free_y") +
    labs(x="Day of Year", 
         y="Rolling-21 mean +/- 1 SEM (approx)", 
         title="Rolling 21-day mean +/- 1 SEM (approx)"
         ) + 
    theme(legend.position="none")

Consistent with previous analyses, temperature has a strong seasonal pattern with differences in mean that vastly exceed the standard error of the mean. Wind is more similar to temperature in having seasonal variations that meaningfully exceed SEM in magnitude. Of interest, the seasonal spikiness in precipitationresults in many times of the year that are outside of one standard error of the mean of each other. This is in contrast to cities explored previously where SEM is typically greater than change in means by time of year

Long term daily data is downloaded for Phoenix, AZ:

# Daily data download for Phoenix, AZ
testURLDaily <- helperOpenMeteoURL(cityName="Phoenix AZ", 
                                   dailyIndices=1:nrow(tblMetricsDaily),
                                   startDate="1960-01-01", 
                                   endDate="2023-12-31", 
                                   tz="US/Arizona"
                                   )
## 
## Daily metrics created from indices: weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration
testURLDaily
## [1] "https://archive-api.open-meteo.com/v1/archive?latitude=33.54&longitude=-112.07&start_date=1960-01-01&end_date=2023-12-31&daily=weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration&timezone=US%2FArizona"
# Download file
if(!file.exists("testOM_daily_phx.json")) {
    fileDownload(fileName="testOM_daily_phx.json", url=testURLDaily)
} else {
    cat("\nFile testOM_daily_phx.json already exists, skipping download\n")
}
## 
## File testOM_daily_phx.json already exists, skipping download

The daily dataset is loaded:

# Read daily JSON file
phxOMDaily <- formatOpenMeteoJSON("testOM_daily_phx.json")
## 
## Objects in JSON include: latitude, longitude, generationtime_ms, utc_offset_seconds, timezone, timezone_abbreviation, elevation, daily_units, daily 
## 
## $tblDaily
## # A tibble: 23,376 × 18
##    date       time       weathercode temperature_2m_max temperature_2m_min
##    <date>     <chr>            <int>              <dbl>              <dbl>
##  1 1960-01-01 1960-01-01          51                7.9                0.3
##  2 1960-01-02 1960-01-02           1                7.4               -3.6
##  3 1960-01-03 1960-01-03           0                9.4               -4  
##  4 1960-01-04 1960-01-04           1               10                 -3  
##  5 1960-01-05 1960-01-05           0                9.4               -3.5
##  6 1960-01-06 1960-01-06           0                9.5               -3.4
##  7 1960-01-07 1960-01-07           0               13                 -1.9
##  8 1960-01-08 1960-01-08           0               17                 -0.4
##  9 1960-01-09 1960-01-09           3               17.4                1.7
## 10 1960-01-10 1960-01-10          51               13.8                5.9
## # ℹ 23,366 more rows
## # ℹ 13 more variables: apparent_temperature_max <dbl>,
## #   apparent_temperature_min <dbl>, precipitation_sum <dbl>, rain_sum <dbl>,
## #   snowfall_sum <dbl>, precipitation_hours <dbl>, sunrise <chr>, sunset <chr>,
## #   windspeed_10m_max <dbl>, windgusts_10m_max <dbl>,
## #   winddirection_10m_dominant <int>, shortwave_radiation_sum <dbl>,
## #   et0_fao_evapotranspiration <dbl>
## 
## $tblHourly
## NULL
## 
## $tblUnits
## # A tibble: 17 × 4
##    metricType  name                       value      description                
##    <chr>       <chr>                      <chr>      <chr>                      
##  1 daily_units time                       "iso8601"  <NA>                       
##  2 daily_units weathercode                "wmo code" The most severe weather co…
##  3 daily_units temperature_2m_max         "deg C"    Maximum and minimum daily …
##  4 daily_units temperature_2m_min         "deg C"    Maximum and minimum daily …
##  5 daily_units apparent_temperature_max   "deg C"    Maximum and minimum daily …
##  6 daily_units apparent_temperature_min   "deg C"    Maximum and minimum daily …
##  7 daily_units precipitation_sum          "mm"       Sum of daily precipitation…
##  8 daily_units rain_sum                   "mm"       Sum of daily rain          
##  9 daily_units snowfall_sum               "cm"       Sum of daily snowfall      
## 10 daily_units precipitation_hours        "h"        The number of hours with r…
## 11 daily_units sunrise                    "iso8601"  Sun rise and set times     
## 12 daily_units sunset                     "iso8601"  Sun rise and set times     
## 13 daily_units windspeed_10m_max          "km/h"     Maximum wind speed and gus…
## 14 daily_units windgusts_10m_max          "km/h"     Maximum wind speed and gus…
## 15 daily_units winddirection_10m_dominant "deg "     Dominant wind direction    
## 16 daily_units shortwave_radiation_sum    "MJ/m²"    The sum of solar radiaion …
## 17 daily_units et0_fao_evapotranspiration "mm"       Daily sum of ET0 Reference…
## 
## $tblDescription
## # A tibble: 1 × 7
##   latitude longitude generationtime_ms utc_offset_seconds timezone  
##      <dbl>     <dbl>             <dbl>              <int> <chr>     
## 1     33.6     -112.             6061.             -25200 US/Arizona
## # ℹ 2 more variables: timezone_abbreviation <chr>, elevation <dbl>
## 
## 
## latitude: 33.56766
## longitude: -112.0818
## generationtime_ms: 6061.302
## utc_offset_seconds: -25200
## timezone: US/Arizona
## timezone_abbreviation: GMT-7
## elevation: 366
# Sample records of tibble
phxOMDaily$tblDaily %>% glimpse()
## Rows: 23,376
## Columns: 18
## $ date                       <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time                       <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode                <int> 51, 1, 0, 1, 0, 0, 0, 0, 3, 51, 51, 61, 61,…
## $ temperature_2m_max         <dbl> 7.9, 7.4, 9.4, 10.0, 9.4, 9.5, 13.0, 17.0, …
## $ temperature_2m_min         <dbl> 0.3, -3.6, -4.0, -3.0, -3.5, -3.4, -1.9, -0…
## $ apparent_temperature_max   <dbl> 4.9, 4.1, 6.4, 6.4, 6.3, 6.5, 10.8, 15.2, 1…
## $ apparent_temperature_min   <dbl> -3.1, -7.8, -8.2, -7.8, -8.0, -7.7, -5.6, -…
## $ precipitation_sum          <dbl> 1.8, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0…
## $ rain_sum                   <dbl> 1.8, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0…
## $ snowfall_sum               <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
## $ precipitation_hours        <dbl> 10, 0, 0, 0, 0, 0, 0, 0, 0, 1, 14, 17, 18, …
## $ sunrise                    <chr> "1960-01-01T07:32", "1960-01-02T07:32", "19…
## $ sunset                     <chr> "1960-01-01T17:30", "1960-01-02T17:31", "19…
## $ windspeed_10m_max          <dbl> 10.0, 9.9, 10.7, 12.3, 12.4, 10.9, 8.0, 8.4…
## $ windgusts_10m_max          <dbl> 29.9, 17.3, 16.9, 24.8, 19.8, 17.3, 14.4, 1…
## $ winddirection_10m_dominant <int> 206, 43, 19, 17, 23, 49, 41, 68, 75, 117, 1…
## $ shortwave_radiation_sum    <dbl> 11.08, 11.07, 13.10, 12.98, 13.19, 13.13, 1…
## $ et0_fao_evapotranspiration <dbl> 1.30, 1.26, 1.59, 1.69, 1.65, 1.56, 1.63, 1…

Variables are converted to proper data type:

dfDailyPHX <- phxOMDaily$tblDaily %>%
    mutate(weathercode=factor(weathercode), 
           sunrise_chr=sunrise, 
           sunset_chr=sunset,
           sunrise=lubridate::ymd_hm(sunrise), 
           sunset=lubridate::ymd_hm(sunset), 
           fct_winddir=factor(winddirection_10m_dominant)
           )
glimpse(dfDailyPHX)
## Rows: 23,376
## Columns: 21
## $ date                       <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time                       <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode                <fct> 51, 1, 0, 1, 0, 0, 0, 0, 3, 51, 51, 61, 61,…
## $ temperature_2m_max         <dbl> 7.9, 7.4, 9.4, 10.0, 9.4, 9.5, 13.0, 17.0, …
## $ temperature_2m_min         <dbl> 0.3, -3.6, -4.0, -3.0, -3.5, -3.4, -1.9, -0…
## $ apparent_temperature_max   <dbl> 4.9, 4.1, 6.4, 6.4, 6.3, 6.5, 10.8, 15.2, 1…
## $ apparent_temperature_min   <dbl> -3.1, -7.8, -8.2, -7.8, -8.0, -7.7, -5.6, -…
## $ precipitation_sum          <dbl> 1.8, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0…
## $ rain_sum                   <dbl> 1.8, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0…
## $ snowfall_sum               <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
## $ precipitation_hours        <dbl> 10, 0, 0, 0, 0, 0, 0, 0, 0, 1, 14, 17, 18, …
## $ sunrise                    <dttm> 1960-01-01 07:32:00, 1960-01-02 07:32:00, …
## $ sunset                     <dttm> 1960-01-01 17:30:00, 1960-01-02 17:31:00, …
## $ windspeed_10m_max          <dbl> 10.0, 9.9, 10.7, 12.3, 12.4, 10.9, 8.0, 8.4…
## $ windgusts_10m_max          <dbl> 29.9, 17.3, 16.9, 24.8, 19.8, 17.3, 14.4, 1…
## $ winddirection_10m_dominant <int> 206, 43, 19, 17, 23, 49, 41, 68, 75, 117, 1…
## $ shortwave_radiation_sum    <dbl> 11.08, 11.07, 13.10, 12.98, 13.19, 13.13, 1…
## $ et0_fao_evapotranspiration <dbl> 1.30, 1.26, 1.59, 1.69, 1.65, 1.56, 1.63, 1…
## $ sunrise_chr                <chr> "1960-01-01T07:32", "1960-01-02T07:32", "19…
## $ sunset_chr                 <chr> "1960-01-01T17:30", "1960-01-02T17:31", "19…
## $ fct_winddir                <fct> 206, 43, 19, 17, 23, 49, 41, 68, 75, 117, 1…

Averages by month for select continuous variables are plotted:

tmpMapNames <- c("precipitation_sum"="3. Precipitation (mm)", 
                 "windspeed_10m_max"="4. Windspeed (kph)", 
                 "temperature_2m_max"="1. High Temperature (C)",
                 "temperature_2m_min"="2. Low Temperature (C)"
                 )

dfDailyPHX %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(year, month, temperature_2m_max, temperature_2m_min, precipitation_sum, windspeed_10m_max) %>%
    group_by(year, month) %>%
    summarize(across(-c(precipitation_sum), .fns=mean), 
              across(c(precipitation_sum), .fns=sum), 
              .groups="drop"
              ) %>%
    group_by(month) %>%
    summarize(across(-c(year), .fns=mean)) %>%
    pivot_longer(-c(month)) %>%
    ggplot(aes(x=month, y=value)) + 
    geom_line(aes(group=1)) + 
    facet_wrap(~tmpMapNames[name], scales="free_y") + 
    labs(x=NULL, y=NULL, title="Monthly averages for key metrics (Phoenix 1960-2023)")

Averages by month for select categorical variables are plotted:

tmpMapNames <- c("wc"="1. Weather Type", 
                 "winddir"="2. Predominant Wind Direction"
                 )

tmpDFPlot <- dfDailyPHX %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), 
           year=year(date), 
           winddir=case_when(winddirection_10m_dominant>360~"Invalid", 
                             winddirection_10m_dominant>=315~"1. N", 
                             winddirection_10m_dominant>=225~"2. W", 
                             winddirection_10m_dominant>=135~"3. S", 
                             winddirection_10m_dominant>=45~"4. E", 
                             winddirection_10m_dominant>=0~"1. N", 
                             TRUE~"Invalid"
                             ), 
           wc=case_when(weathercode==0~"1. Clear", 
                        weathercode %in% c(1, 2, 3)~"2. Dry", 
                        weathercode %in% c(51, 53, 55)~"3. Drizzle", 
                        weathercode %in% c(61, 63, 65)~"4. Rain", 
                        weathercode %in% c(71, 73, 75)~"5. Snow", 
                        TRUE~"Error"
                        )
           ) %>%
    select(month, wc, winddir) %>%
    pivot_longer(-c(month)) %>%
    count(month, name, value)

tmpPlotFN <- function(x) {
    p1 <- tmpDFPlot %>%
        filter(name==x) %>%
        ggplot(aes(x=month, y=n)) + 
        geom_line(aes(group=value, color=value), lwd=2) + 
        labs(x=NULL, 
             y=NULL, 
             title=paste0(tmpMapNames[x], " (Phoenix 1960-2023)")
             ) + 
        scale_color_discrete(NULL) + 
        theme(legend.position = "bottom")
    return(p1)
    }

gridExtra::grid.arrange(tmpPlotFN("wc"), tmpPlotFN("winddir"), nrow=1)

Averages by year for select continuous variables are plotted:

tmpMapNames <- c("precipitation_sum"="3. Precipitation (mm)", 
                 "windspeed_10m_max"="4. Windspeed (kph)", 
                 "temperature_2m_max"="1. High Temperature (C)",
                 "temperature_2m_min"="2. Low Temperature (C)"
                 )

dfDailyPHX %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(year, temperature_2m_max, temperature_2m_min, precipitation_sum, windspeed_10m_max) %>%
    group_by(year) %>%
    summarize(across(-c(precipitation_sum), .fns=mean), 
              across(c(precipitation_sum), .fns=sum), 
              .groups="drop"
              ) %>%
    pivot_longer(-c(year)) %>%
    ggplot(aes(x=year, y=value)) + 
    geom_line(aes(group=1)) + 
    facet_wrap(~tmpMapNames[name], scales="free_y") + 
    labs(x=NULL, y=NULL, title="Annual averages for key metrics (Phoenix 1960-2023)")

Averages by year for select categorical variables are plotted:

tmpMapNames <- c("wc"="1. Weather Type", 
                 "winddir"="2. Predominant Wind Direction"
                 )

tmpDFPlot <- dfDailyPHX %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), 
           year=year(date), 
           winddir=case_when(winddirection_10m_dominant>360~"Invalid", 
                             winddirection_10m_dominant>=315~"1. N", 
                             winddirection_10m_dominant>=225~"2. W", 
                             winddirection_10m_dominant>=135~"3. S", 
                             winddirection_10m_dominant>=45~"4. E", 
                             winddirection_10m_dominant>=0~"1. N", 
                             TRUE~"Invalid"
                             ), 
           wc=case_when(weathercode==0~"1. Clear", 
                        weathercode %in% c(1, 2, 3)~"2. Dry", 
                        weathercode %in% c(51, 53, 55)~"3. Drizzle", 
                        weathercode %in% c(61, 63, 65)~"4. Rain", 
                        weathercode %in% c(71, 73, 75)~"5. Snow", 
                        TRUE~"Error"
                        )
           ) %>%
    select(year, wc, winddir) %>%
    pivot_longer(-c(year)) %>%
    count(year, name, value)

tmpPlotFN <- function(x) {
    p1 <- tmpDFPlot %>%
        filter(name==x) %>%
        ggplot(aes(x=year, y=n)) + 
        geom_line(aes(group=value, color=value), lwd=1) + 
        labs(x=NULL, 
             y=NULL, 
             title=paste0(tmpMapNames[x], " (Phoenix 1960-2023)")
             ) + 
        scale_color_discrete(NULL) + 
        theme(legend.position = "bottom")
    return(p1)
    }

gridExtra::grid.arrange(tmpPlotFN("wc"), tmpPlotFN("winddir"), nrow=1)

Boxplots for maximum windspeed are created:

dfDailyPHX %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(month, windspeed_10m_max) %>%
    ggplot(aes(x=month, y=windspeed_10m_max)) + 
    geom_boxplot(fill="lightblue") + 
    lims(y=c(0, NA)) +
    labs(x=NULL, 
         y="Maximum daily wind speed (kph)", 
         title="Maximum daily windspeed by month (Phoenix 1960-2023)"
         )

Boxplots for precipitation are created:

dfDailyPHX %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(month, precipitation_hours) %>%
    ggplot(aes(x=month, y=precipitation_hours)) + 
    geom_boxplot(fill="lightblue") + 
    lims(y=c(0, NA)) +
    labs(x=NULL, 
         y="Daily precipitation hours", 
         title="Daily precipitation hours by month (Phoenix 1960-2023)"
         )

dfDailyPHX %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(month, precipitation_sum) %>%
    ggplot(aes(x=month, y=precipitation_sum)) + 
    geom_boxplot(fill="lightblue") + 
    lims(y=c(0, NA)) +
    labs(x=NULL, 
         y="Daily precipitation (mm)", 
         title="Daily precipitation by month (Phoenix 1960-2023)"
         )

Boxplots for temperature are created:

dfDailyPHX %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(month, temperature_2m_max) %>%
    ggplot(aes(x=month, y=temperature_2m_max)) + 
    geom_boxplot(fill="lightblue") + 
    labs(x=NULL, 
         y="Daily high temperature (C)", 
         title="Daily high temperature by month (Phoenix 1960-2023)"
         )

dfDailyPHX %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(month, temperature_2m_min) %>%
    ggplot(aes(x=month, y=temperature_2m_min)) + 
    geom_boxplot(fill="lightblue") + 
    labs(x=NULL, 
         y="Daily low temperature (C)", 
         title="Daily low temperature by month (Phoenix 1960-2023)"
         )

ACF and PACF are explored for maximum temperature:

acfTemp <- acf(dfDailyPHX$temperature_2m_max, lag.max=1000)

as.vector(acfTemp$acf) %>% findPeaks(width=21) %>% which()
## [1] 368 731
as.vector(acfTemp$acf) %>% findPeaks(width=21, FUN=min) %>% which()
## [1] 184 546 913
pacfTemp <- pacf(dfDailyPHX$temperature_2m_max, lag.max=1000)

pacf(dfDailyPHX$temperature_2m_max, lag.max=50)

As expected, ACF has a sustained seasonal pattern, with peaks (at roughly intervals of 365) and valleys (at roughly intervals of 365 offset by roughly 185) corresponding to the 365-day year

ACF and PACF are explored for precipitation:

acfPrecip <- acf(dfDailyPHX$precipitation_sum, lag.max=1000)

acf(dfDailyPHX$precipitation_sum, lag.max=50)

as.vector(acfPrecip$acf) %>% findPeaks(width=21) %>% which()
##  [1]  20  40  82 105 138 154 171 192 219 230 252 271 294 324 350 368 387 398 425
## [20] 466 478 500 516 542 560 597 620 632 653 686 730 750 761 777 809 845 866 886
## [39] 899 920 941 964
as.vector(acfPrecip$acf) %>% findPeaks(width=21, FUN=min) %>% which()
##  [1]  27  48  73  91 119 152 186 205 229 247 267 283 300 318 329 340 365 380 406
## [20] 420 445 486 505 547 566 584 624 645 663 674 700 715 742 754 768 803 819 842
## [39] 856 872 890 909 929 972
pacfPrecip <- pacf(dfDailyPHX$precipitation_sum, lag.max=1000)

pacf(dfDailyPHX$precipitation_sum, lag.max=50)

Precipitation by contrast has little seasonality and mostly just a slight correlation to the previous day’s value

ACF and PACF are explored for wind speed:

acfWind <- acf(dfDailyPHX$windspeed_10m_max, lag.max=1000)

acf(dfDailyPHX$windspeed_10m_max, lag.max=50)

as.vector(acfWind$acf) %>% findPeaks(width=21) %>% which()
##  [1]  70 163 194 217 357 481 553 577 600 688 716 744 776 895 923 935 947 974
as.vector(acfWind$acf) %>% findPeaks(width=21, FUN=min) %>% which()
##  [1]  26 166 182 197 240 253 365 400 489 521 542 556 585 722 768 890 917 932
pacfWind <- pacf(dfDailyPHX$windspeed_10m_max, lag.max=1000)

pacf(dfDailyPHX$windspeed_10m_max, lag.max=50)

Similar to temperature, wind speed appears to have a sustained seasonal component, though peak correlations are lower in magnitude (~0.20 for wind speed vs. ~0.75 for temperature)

A boxplot for delta daily temperature is created:

dfDailyPHX %>%
    mutate(month=factor(month.abb[month(date)], levels=month.abb), 
           chg=temperature_2m_max-lag(temperature_2m_max)
           ) %>%
    filter(!is.na(chg)) %>%
    ggplot(aes(x=month, y=chg)) + 
    geom_boxplot() + 
    labs(x=NULL, y="Daily change in maximum temperature")

Daily temperature changes are generally larger in magnitude during Spring

A boxplot for delta daily wind speed is created:

dfDailyPHX %>%
    mutate(month=factor(month.abb[month(date)], levels=month.abb), 
           chg=windspeed_10m_max-lag(windspeed_10m_max)
           ) %>%
    filter(!is.na(chg)) %>%
    ggplot(aes(x=month, y=chg)) + 
    geom_boxplot() + 
    labs(x=NULL, y="Daily change in maximum wind speed")

Daily wind speed changes are generally similar in magnitude by season

A boxplot for delta daily precipitation is created:

dfDailyPHX %>%
    mutate(month=factor(month.abb[month(date)], levels=month.abb), 
           chg=precipitation_sum-lag(precipitation_sum)
           ) %>%
    filter(!is.na(chg)) %>%
    ggplot(aes(x=month, y=chg)) + 
    geom_boxplot() + 
    labs(x=NULL, y="Daily change in precipitation")

Most days have no precipitation. The corresponding boxplot for precipitation change has small boxes and whiskers with many outliers

Averages for temperature, precipitation, and wind speed are calculated by day of year, using a 21-day rolling window:

df_r21 <- dfDailyPHX %>%
    mutate(doy=pmin(yday(date), 365)) %>% 
    helperRollingAgg(origVar="temperature_2m_max", newName="temp_r21", k=21) %>% 
    helperRollingAgg(origVar="windspeed_10m_max", newName="wind_r21", k=21) %>% 
    helperRollingAgg(origVar="precipitation_sum", newName="precip_r21", k=21) %>%
    select(date, doy, contains("_r21"))

df_r21 %>%
    na.omit() %>%
    group_by(doy) %>%
    summarize(across(where(is.numeric), .fns=mean), n=n()) %>%
    pivot_longer(cols=-c(doy)) %>%
    ggplot(aes(x=doy, y=value)) + 
    geom_line(aes(group=name, color=name)) + 
    facet_wrap(~name, scales="free_y") +
    labs(x="Day of Year", y="Rolling-21 mean") + 
    theme(legend.position="none")

Outliers are likely much more important in driving average precipitation than in driving average temperature

Standard deviations for temperature, precipitation, and wind speed are calculated by day of year, using a 21-day rolling window:

df_r21_sd <- dfDailyPHX %>%
    mutate(doy=pmin(yday(date), 365)) %>% 
    group_by(doy) %>%
    mutate(across(.cols=c(temperature_2m_max, windspeed_10m_max, precipitation_sum), .fns=sd)) %>%
    ungroup() %>%
    helperRollingAgg(origVar="temperature_2m_max", newName="temp_r21", k=21) %>% 
    helperRollingAgg(origVar="windspeed_10m_max", newName="wind_r21", k=21) %>% 
    helperRollingAgg(origVar="precipitation_sum", newName="precip_r21", k=21) %>%
    select(date, doy, contains("_r21"))

df_r21_sd %>%
    na.omit() %>%
    group_by(doy) %>%
    summarize(across(where(is.numeric), .fns=mean)) %>%
    pivot_longer(cols=-c(doy)) %>%
    ggplot(aes(x=doy, y=value)) + 
    geom_line(aes(group=name, color=name)) + 
    facet_wrap(~name, scales="free_y") +
    labs(x="Day of Year", y="Rolling-21 mean of daily standard deviation") + 
    theme(legend.position="none")

Means and standard deviations are plotted together:

df_r21 %>%
    bind_rows(df_r21_sd, .id="src") %>%
    na.omit() %>%
    mutate(musig=c("1"="Mean", "2"="SD")[src]) %>%
    group_by(doy, musig) %>%
    summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
    pivot_longer(cols=-c(doy, musig)) %>%
    pivot_wider(id_cols=c(doy, name), names_from="musig") %>%
    ggplot(aes(x=doy)) + 
    geom_line(aes(y=Mean, group=name, color=name), lwd=2) + 
    geom_ribbon(aes(ymin=Mean-SD, ymax=Mean+SD, fill=name), alpha=0.5) +
    facet_wrap(~name, scales="free_y") +
    labs(x="Day of Year", 
         y="Rolling-21 mean +/- daily standard deviation", 
         title="Rolling 21-day mean +/- one rolling 21-day sd"
         ) + 
    theme(legend.position="none")

Means and approximate SEM are plotted together:

nYears <- length(unique(year(df_r21$date)))
nYears
## [1] 64
df_r21 %>%
    bind_rows(df_r21_sd, .id="src") %>%
    na.omit() %>%
    mutate(musig=c("1"="Mean", "2"="SD")[src]) %>%
    group_by(doy, musig) %>%
    summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
    pivot_longer(cols=-c(doy, musig)) %>%
    pivot_wider(id_cols=c(doy, name), names_from="musig") %>%
    ggplot(aes(x=doy)) + 
    geom_line(aes(y=Mean, group=name, color=name), lwd=2) + 
    geom_ribbon(aes(ymin=Mean-SD/sqrt(nYears-1), ymax=Mean+SD/sqrt(nYears-1), fill=name), alpha=0.5) +
    facet_wrap(~name, scales="free_y") +
    labs(x="Day of Year", 
         y="Rolling-21 mean +/- 1 SEM (approx)", 
         title="Rolling 21-day mean +/- 1 SEM (approx)"
         ) + 
    theme(legend.position="none")

Consistent with previous analyses, temperature has a strong seasonal pattern with differences in mean that vastly exceed the standard error of the mean. Wind is more similar to temperature in having seasonal variations that meaningfully exceed SEM in magnitude. Of interest, the seasonal spikiness in precipitation results in many times of the year that are outside of one standard error of the mean of each other. This is in contrast to cities explored previously where SEM is typically greater than change in means by time of year

The download process is converted to functional form:

helperNewCityDailyDownload <- function(cityName, 
                                       tz, 
                                       abb,
                                       fName=paste0("testOM_daily_", abb, ".json"),
                                       idx=1:nrow(tblMetricsDaily), 
                                       startDate="1960-01-01", 
                                       endDate="2023-12-31"
                                       ) {

    testURLDaily <- helperOpenMeteoURL(cityName=cityName, 
                                       dailyIndices=idx,
                                       startDate=startDate, 
                                       endDate=endDate, 
                                       tz=tz
                                       )
    cat("\nDownload URL:\n")
    print(testURLDaily)

    # Download file
    if(!file.exists(fName)) {
        fileDownload(fileName=fName, url=testURLDaily)
    } else {
        cat("\nFile", fName, "already exists, skipping download\n")
    }

}

The function is run for a new city:

helperNewCityDailyDownload(cityName="Kansas City MO", tz = "US/Central", abb="mci")
## 
## Daily metrics created from indices: weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration 
## 
## 
## Download URL:
## [1] "https://archive-api.open-meteo.com/v1/archive?latitude=39.12&longitude=-94.55&start_date=1960-01-01&end_date=2023-12-31&daily=weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration&timezone=US%2FCentral"
## 
## File testOM_daily_mci.json already exists, skipping download

Creation of an analysis file is converted to functional form:

createDailyDF <- function(fName=NULL, abb=NULL, glimpseFOMJ=FALSE, glimpseList=FALSE, glimpseDF=TRUE) {
    
    # Create fName from abb if fName not passed
    if(is.null(fName)) {
        if(is.null(abb)) stop("\nMust provide abb or fName to function createDailyDF()\n")
        else fName <- paste0("testOM_daily_", abb, ".json")
    }
    
    # Read daily JSON file
    lstDaily <- formatOpenMeteoJSON(fName, glimpseData=glimpseFOMJ)

    # Sample records of tibble inside list
    if(isTRUE(glimpseList)) lstDaily$tblDaily %>% glimpse()

    # Convert variables to proper type
    dfDaily <- lstDaily$tblDaily %>%
        mutate(weathercode=factor(weathercode), 
               sunrise_chr=sunrise, 
               sunset_chr=sunset,
               sunrise=lubridate::ymd_hm(sunrise), 
               sunset=lubridate::ymd_hm(sunset), 
               fct_winddir=factor(winddirection_10m_dominant)
               )
    
    # Sample records of final tibble
    if(isTRUE(glimpseDF)) glimpse(dfDaily)

    # Return final tibble
    dfDaily
    
}

The function is tested:

dfDaily <- createDailyDF(abb="mci")
## 
## Objects in JSON include: latitude, longitude, generationtime_ms, utc_offset_seconds, timezone, timezone_abbreviation, elevation, daily_units, daily 
## 
## Rows: 23,376
## Columns: 21
## $ date                       <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time                       <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode                <fct> 71, 3, 3, 3, 3, 0, 3, 2, 3, 51, 55, 63, 51,…
## $ temperature_2m_max         <dbl> 6.4, 5.5, -3.1, -0.6, -5.1, 5.6, 7.3, 8.6, …
## $ temperature_2m_min         <dbl> 0.8, -4.8, -9.6, -12.6, -10.6, -6.6, -1.1, …
## $ apparent_temperature_max   <dbl> 1.0, 0.3, -8.7, -5.9, -9.1, 0.2, 3.2, 4.9, …
## $ apparent_temperature_min   <dbl> -4.5, -11.6, -15.3, -17.6, -16.7, -12.1, -6…
## $ precipitation_sum          <dbl> 0.3, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0…
## $ rain_sum                   <dbl> 0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0…
## $ snowfall_sum               <dbl> 0.14, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
## $ precipitation_hours        <dbl> 2, 0, 0, 0, 0, 0, 0, 0, 0, 1, 7, 16, 1, 12,…
## $ sunrise                    <dttm> 1960-01-01 08:37:00, 1960-01-02 08:37:00, …
## $ sunset                     <dttm> 1960-01-01 18:05:00, 1960-01-02 18:06:00, …
## $ windspeed_10m_max          <dbl> 25.9, 24.6, 22.4, 14.0, 16.2, 18.8, 19.1, 1…
## $ windgusts_10m_max          <dbl> 53.3, 49.7, 47.5, 29.9, 35.6, 36.7, 36.7, 2…
## $ winddirection_10m_dominant <int> 153, 263, 306, 268, 346, 221, 232, 171, 193…
## $ shortwave_radiation_sum    <dbl> 2.66, 9.49, 9.46, 10.03, 6.87, 10.38, 10.13…
## $ et0_fao_evapotranspiration <dbl> 0.95, 1.25, 1.03, 0.88, 0.72, 1.56, 1.39, 1…
## $ sunrise_chr                <chr> "1960-01-01T08:37", "1960-01-02T08:37", "19…
## $ sunset_chr                 <chr> "1960-01-01T18:05", "1960-01-02T18:06", "19…
## $ fct_winddir                <fct> 153, 263, 306, 268, 346, 221, 232, 171, 193…
dfDaily
## # A tibble: 23,376 × 21
##    date       time       weathercode temperature_2m_max temperature_2m_min
##    <date>     <chr>      <fct>                    <dbl>              <dbl>
##  1 1960-01-01 1960-01-01 71                         6.4                0.8
##  2 1960-01-02 1960-01-02 3                          5.5               -4.8
##  3 1960-01-03 1960-01-03 3                         -3.1               -9.6
##  4 1960-01-04 1960-01-04 3                         -0.6              -12.6
##  5 1960-01-05 1960-01-05 3                         -5.1              -10.6
##  6 1960-01-06 1960-01-06 0                          5.6               -6.6
##  7 1960-01-07 1960-01-07 3                          7.3               -1.1
##  8 1960-01-08 1960-01-08 2                          8.6               -2.7
##  9 1960-01-09 1960-01-09 3                         11.8                1.6
## 10 1960-01-10 1960-01-10 51                         5.9                4  
## # ℹ 23,366 more rows
## # ℹ 16 more variables: apparent_temperature_max <dbl>,
## #   apparent_temperature_min <dbl>, precipitation_sum <dbl>, rain_sum <dbl>,
## #   snowfall_sum <dbl>, precipitation_hours <dbl>, sunrise <dttm>,
## #   sunset <dttm>, windspeed_10m_max <dbl>, windgusts_10m_max <dbl>,
## #   winddirection_10m_dominant <int>, shortwave_radiation_sum <dbl>,
## #   et0_fao_evapotranspiration <dbl>, sunrise_chr <chr>, sunset_chr <chr>, …

A function is written to plot continuous variable averages by month:

plotContVarMean <- function(df, 
                            tmpMapNames=c("precipitation_sum"="3. Precipitation (mm)", 
                                          "windspeed_10m_max"="4. Windspeed (kph)", 
                                          "temperature_2m_max"="1. High Temperature (C)",
                                          "temperature_2m_min"="2. Low Temperature (C)"
                                          ), 
                            idxMapSum=c(1),
                            isMonthly=TRUE,
                            titleStarter=if(isTRUE(isMonthly)) "Monthly" else "Annual",
                            titleLoc="",
                            titleEnder=paste0("(", titleLoc, if(str_length(titleLoc)>0) " ", "1960-2023)"),
                            printPlot=TRUE, 
                            returnPlot=!isTRUE(printPlot)
                            ) {

    # FUNCTION ARGUMENTS:
    # df: tibble or data frame from createDailyDF()
    # tmpMapNames: variables to summarize in format c("variablename"="variable description")
    # idxMapSum: indices of the variables in tmpMapNames thatshould be summed rather than averaged
    # isMonthly: boolean with TRUE being monthly and FALSE being annual
    # titleStarter: opening phrase for title
    # titleLoc: location to be used in parentheses at end of title ("" means just show year range)
    # titleEnder: ending phrase for title
    # printPlot: boolean, should the plot be printed?
    # returnPlot: boolean, should the plot be returned?
    
    p1 <- df %>%
        mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
        select(all_of(c("year", if(isTRUE(isMonthly)) "month", names(tmpMapNames)))) %>%
        group_by(pick(if(isTRUE(isMonthly)) c("year", "month") else c("year"))) %>%
        summarize(across(-all_of(names(tmpMapNames)[idxMapSum]), .fns=mean), 
                  across(all_of(names(tmpMapNames)[idxMapSum]), .fns=sum), 
                  .groups="drop"
                  ) 
    
    if(isTRUE(isMonthly)) {
        p1 <- p1 %>%
            group_by(month) %>%
            summarize(across(-c(year), .fns=mean)) %>%
            pivot_longer(-c(month))
    } else {
        p1 <- p1 %>%
            pivot_longer(-c(year))
    }
    
    p1 <- p1 %>%
        ggplot(aes(x=.data[[if(isTRUE(isMonthly)) "month" else "year"]], y=value)) + 
        geom_line(aes(group=1)) + 
        facet_wrap(~tmpMapNames[name], scales="free_y") + 
        labs(x=NULL, 
             y=NULL, 
             title=paste0(titleStarter, " averages for key metrics ", titleEnder)
             )
    
    if(isTRUE(printPlot)) print(p1)
    if(isTRUE(returnPlot)) p1
    
}

The function is tested for select continuous variables:

plotContVarMean(dfDaily, isMonthly=TRUE, titleLoc="Kansas City, MO")

plotContVarMean(dfDaily, isMonthly=FALSE, titleLoc="Kansas City, MO")

A function is written to plot continuous variable averages by month:

plotCatVarMean <- function(df, 
                           tmpMapNames=c("wc"="1. Weather Type", 
                                         "winddir"="2. Predominant Wind Direction"
                                         ), 
                           idxMapSum=c(1),
                           isMonthly=TRUE,
                           titleStarter=if(isTRUE(isMonthly)) "Monthly" else "Annual",
                           titleLoc="",
                           titleEnder=paste0("(", titleLoc, if(str_length(titleLoc)>0) " ", "1960-2023)"),
                           printPlot=TRUE, 
                           returnPlot=!isTRUE(printPlot)
                           ) {

    tmpDFPlot <- df %>%
        mutate(month=factor(month(date), levels=1:12, labels=month.abb), 
               year=year(date), 
               winddir=case_when(winddirection_10m_dominant>360~"Invalid", 
                                 winddirection_10m_dominant>=315~"1. N", 
                                 winddirection_10m_dominant>=225~"2. W", 
                                 winddirection_10m_dominant>=135~"3. S", 
                                 winddirection_10m_dominant>=45~"4. E", 
                                 winddirection_10m_dominant>=0~"1. N", 
                                 TRUE~"Invalid"
                                 ), 
               wc=case_when(weathercode==0~"1. Clear", 
                            weathercode %in% c(1, 2, 3)~"2. Dry", 
                            weathercode %in% c(51, 53, 55)~"3. Drizzle", 
                            weathercode %in% c(61, 63, 65)~"4. Rain", 
                            weathercode %in% c(71, 73, 75)~"5. Snow", 
                            TRUE~"Error"
                            )
               ) %>%
        select(all_of(c(if(isTRUE(isMonthly)) "month" else "year", names(tmpMapNames))))

    if(isTRUE(isMonthly)) {
        tmpDFPlot <- tmpDFPlot %>%
            pivot_longer(-c(month)) %>%
            count(month, name, value)
    } else {
        tmpDFPlot <- tmpDFPlot %>%
            pivot_longer(-c(year)) %>%
            count(year, name, value)
    }
        
    tmpPlotFN <- function(x) {
        p1 <- tmpDFPlot %>%
            filter(name==x) %>%
            ggplot(aes(x=.data[[if(isTRUE(isMonthly)) "month" else "year"]], y=n)) + 
            geom_line(aes(group=value, color=value), lwd=2) + 
            labs(x=NULL, 
                 y=NULL, 
                 title=paste0(titleStarter, " average for ", tmpMapNames[x], " ", titleEnder)
                 ) + 
            scale_color_discrete(NULL) + 
            theme(legend.position = "bottom")
        return(p1)
    }

    # Update to be more flexible
    grid::grid.newpage()
    p1 <- gridExtra::arrangeGrob(tmpPlotFN("wc"), tmpPlotFN("winddir"), nrow=1)

    if(isTRUE(printPlot)) grid::grid.draw(p1)
    if(isTRUE(returnPlot)) p1
    
}

The function is tested for select categorical variables:

plotCatVarMean(dfDaily, isMonthly=TRUE, titleLoc="Kansas City, MO")

plotCatVarMean(dfDaily, isMonthly=FALSE, titleLoc="Kansas City, MO")

A function is written for creating weather boxplots:

omCreateBoxPlot <- function(df, 
                            keyVar, 
                            ymin=NA, 
                            ymax=NA, 
                            mapDesc=c("windspeed_10m_max"="Maximum wind speed (kph)", 
                                      "precipitation_hours"="Precipitation (hours)", 
                                      "precipitation_sum"="Precipitation (mm)", 
                                      "temperature_2m_max"="Maximum Temperature (C)", 
                                      "temperature_2m_min"="Minimum Temperature (C)"
                                      ),
                            keyVarDesc=if(keyVar %in% names(mapDesc)) mapDesc[keyVar] else "Key Variable",
                            titleLoc="",
                            titleEnder=paste0("(", titleLoc, if(str_length(titleLoc)>0) " ", "1960-2023)")
                            ) {

    p1 <- df %>%
        mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
        select(all_of(c("month", keyVar))) %>%
        ggplot(aes(x=month, y=.data[[keyVar]])) + 
        geom_boxplot(fill="lightblue") + 
        labs(x=NULL, 
             y=keyVarDesc, 
             title=paste0(keyVarDesc, " by month ", titleEnder)
             )
    if(!is.na(ymin) | !is.na(ymax)) p1 <- p1 + lims(y=c(ymin, ymax))
    print(p1)
    
}

The function is tested for select variables:

omCreateBoxPlot(dfDaily, keyVar="windspeed_10m_max", ymin=0, titleLoc="Kansas City, MO")

omCreateBoxPlot(dfDaily, keyVar="precipitation_hours", ymin=0, titleLoc="Kansas City, MO")

omCreateBoxPlot(dfDaily, keyVar="precipitation_sum", ymin=0, titleLoc="Kansas City, MO")

omCreateBoxPlot(dfDaily, keyVar="temperature_2m_max", titleLoc="Kansas City, MO")

omCreateBoxPlot(dfDaily, keyVar="temperature_2m_min", titleLoc="Kansas City, MO")

A function is created for running ACF/PACF:

omACFPACF <- function(df, keyVar, bigLag=1000, smallLag=50, smallACF=FALSE, smallPACF=TRUE) {

    # Big lag for ACF
    acfTemp <- acf(df %>% pull(keyVar), lag.max=bigLag, main=paste0("ACF: ", keyVar))

    # Peaks for ACF
    cat("\nACF peaks\n")
    as.vector(acfTemp$acf) %>% findPeaks(width=21) %>% which() %>% print()

    # Troughs for ACF
    cat("\nACF troughs\n")
    as.vector(acfTemp$acf) %>% findPeaks(width=21, FUN=min) %>% which() %>% print()

    # Small lag for ACF (if requested)
    if(isTRUE(smallACF)) acfTemp <- acf(df %>% pull(keyVar), lag.max=smallLag, main=paste0("ACF: ", keyVar))
    
    # Big lag for PACF
    pacfTemp <- pacf(df %>% pull(keyVar), lag.max=bigLag, main=paste0("PACF: ", keyVar))
    
    # Small lag for PACF (if requested)
    if(isTRUE(smallPACF)) pacfTemp <- pacf(df %>% pull(keyVar), lag.max=smallLag, main=paste0("PACF: ", keyVar))
    
}

The function is tested for select variables:

omACFPACF(dfDaily, keyVar="temperature_2m_max")

## 
## ACF peaks
## [1] 369 726
## 
## ACF troughs
## [1] 180 547 915

omACFPACF(dfDaily, keyVar="precipitation_sum", smallACF=TRUE)

## 
## ACF peaks
##  [1]  23  34  53  67  87 115 147 166 187 207 226 243 262 277 288 349 361 374 385
## [20] 404 441 459 481 504 533 561 579 600 620 635 650 673 685 696 712 723 741 756
## [39] 767 785 820 836 881 913 925 948 962 974
## 
## ACF troughs
##  [1]  24  36  59  76  91 111 129 144 158 184 224 251 269 280 305 319 334 371 389
## [20] 402 420 462 485 508 531 550 563 585 604 628 658 678 731 743 777 799 823 842
## [39] 862 875 893 916 936 955 968 982

omACFPACF(dfDaily, keyVar="windspeed_10m_max", smallACF=TRUE)

## 
## ACF peaks
##  [1] 140 158 177 202 214 238 373 462 477 489 511 533 561 580 612 628 722 735 788
## [20] 848 883 900 919 933 952 964 982
## 
## ACF troughs
##  [1] 125 142 160 180 208 221 240 319 486 506 547 589 624 638 840 855 875 897 910
## [20] 925 948 966 989

Function omCreateBoxPlot() is updated to allow for converting a variable to a change:

omCreateBoxPlot <- function(df, 
                            keyVar, 
                            chgLag=NULL,
                            ymin=NA, 
                            ymax=NA, 
                            mapDesc=c("windspeed_10m_max"="Maximum wind speed (kph)", 
                                      "precipitation_hours"="Precipitation (hours)", 
                                      "precipitation_sum"="Precipitation (mm)", 
                                      "temperature_2m_max"="Maximum Temperature (C)", 
                                      "temperature_2m_min"="Minimum Temperature (C)"
                                      ),
                            keyVarDesc=if(keyVar %in% names(mapDesc)) mapDesc[keyVar] else "Key Variable",
                            titleLoc="",
                            titleEnder=paste0("(", titleLoc, if(str_length(titleLoc)>0) " ", "1960-2023)")
                            ) {

    p1 <- df %>%
        mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
        select(all_of(c("month", keyVar)))
    
    if(!is.null(chgLag)) 
        p1 <- p1 %>%
            mutate(across(all_of(keyVar), .fns=function(x) x - lag(x, chgLag))) %>%
            filter(if_all(all_of(keyVar), .fns=function(x) !is.na(x)))
    
    p1 <- p1 %>%
        ggplot(aes(x=month, y=.data[[keyVar]])) + 
        geom_boxplot(fill="lightblue") + 
        labs(x=NULL, 
             y=keyVarDesc, 
             title=paste0(if(is.null(chgLag)) "" else "Change in ", keyVarDesc, " by month ", titleEnder)
             )
    if(!is.na(ymin) | !is.na(ymax)) p1 <- p1 + lims(y=c(ymin, ymax))
    print(p1)
    
}

The updated function is tested for select variables:

omCreateBoxPlot(dfDaily, keyVar="temperature_2m_max", chgLag=1, titleLoc="Kansas City, MO")

omCreateBoxPlot(dfDaily, keyVar="windspeed_10m_max", chgLag=1, titleLoc="Kansas City, MO")

omCreateBoxPlot(dfDaily, keyVar="precipitation_sum", chgLag=1, titleLoc="Kansas City, MO")

A function is written to calculate means and standard deviations:

omDailyMeanSD <- function(df, 
                          rollVars, 
                          rollK,                             
                          mapDesc=c("windspeed_10m_max"="windMax", 
                                    "precipitation_hours"="precipHours", 
                                    "precipitation_sum"="precipSum", 
                                    "temperature_2m_max"="tempMax", 
                                    "temperature_2m_min"="tempMin"
                                    ), 
                          addSuffix=TRUE, 
                          makeSEM=FALSE, 
                          printPlot=TRUE, 
                          returnPlot=!isTRUE(printPlot)
                          ) {
    
    if(isTRUE(addSuffix)) for(x in names(mapDesc)) mapDesc[x] <- paste0(mapDesc[x], "_r", rollK)
    
    # Calculate means
    df_r21 <- df %>% mutate(doy=pmin(yday(date), 365))
    
    for(x in rollVars) {
        df_r21 <- df_r21 %>% 
            helperRollingAgg(origVar=x, newName=unname(mapDesc[x]), k=rollK)
    }
    
    df_r21 <- df_r21 %>% select(all_of(c("date", "doy", unname(mapDesc[rollVars]))))

    # Calculate standard deviations
    df_r21_sd <- df %>%
        mutate(doy=pmin(yday(date), 365)) %>% 
        group_by(doy) %>%
        mutate(across(.cols=all_of(rollVars), .fns=sd)) %>%
        ungroup()
    
    for(x in rollVars) { 
        df_r21_sd <- df_r21_sd %>% 
            helperRollingAgg(origVar=x, newName=unname(mapDesc[x]), k=rollK)
    }
    
    df_r21_sd <- df_r21_sd %>% select(all_of(c("date", "doy", unname(mapDesc[rollVars]))))
    
    
    # Create plot of means and standard deviations or standard errors of the mean
    if(isTRUE(makeSEM)) divBy <- sqrt(length(unique(year(df_r21$date)))-1)
    else divBy <- 1
    
    p1 <- df_r21 %>%
        bind_rows(df_r21_sd, .id="src") %>%
        na.omit() %>%
        mutate(musig=c("1"="Mean", "2"="SD")[src]) %>%
        group_by(doy, musig) %>%
        summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
        pivot_longer(cols=-c(doy, musig)) %>%
        pivot_wider(id_cols=c(doy, name), names_from="musig") %>%
        ggplot(aes(x=doy)) + 
        geom_line(aes(y=Mean, group=name, color=name), lwd=2) + 
        geom_ribbon(aes(ymin=Mean-SD/divBy, ymax=Mean+SD/divBy, fill=name), alpha=0.5) +
        facet_wrap(~name, scales="free_y") +
        labs(x="Day of Year", 
             y=paste0("Rolling ", 
                      rollK, 
                      "-day mean +/- 1 ", 
                      if(isTRUE(makeSEM)) "SEM (approx)" else "sd"
                      ), 
             title=paste0("Rolling ", 
                          rollK, 
                          "-day mean +/- 1 rolling ", 
                          rollK, 
                          "-day ", 
                          if(isTRUE(makeSEM)) "SEM (approx)" else "sd"
                          )
             ) + 
        theme(legend.position="none")
    
    if(isTRUE(printPlot)) print(p1)
    if(isTRUE(returnPlot)) return(p1)

}

The new function is tested for select variables:

# Standard deviation only
omDailyMeanSD(dfDaily, 
              rollK=21, 
              rollVars=c("temperature_2m_max", "windspeed_10m_max", "precipitation_sum")
              )

# SEM (approx)
omDailyMeanSD(dfDaily, 
              rollK=21, 
              rollVars=c("temperature_2m_max", "windspeed_10m_max", "precipitation_sum"), 
              makeSEM=TRUE
              )

The functions are run in combination for a new city:

# 0. Declare abbreviation and city name to be downloaded
abbCity <- "bna"
tzCity <- "US/Central"
dbNameCity <- "Nashville TN"
useNameCity <- dbNameCity

# 1. Download data
helperNewCityDailyDownload(cityName=dbNameCity, tz = tzCity, abb=abbCity)
## 
## Daily metrics created from indices: weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration 
## 
## 
## Download URL:
## [1] "https://archive-api.open-meteo.com/v1/archive?latitude=36.17&longitude=-86.78&start_date=1960-01-01&end_date=2023-12-31&daily=weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration&timezone=US%2FCentral"
## 
## File testOM_daily_bna.json already exists, skipping download
# 2. Load and process data
dfDaily <- createDailyDF(abb=abbCity)
## 
## Objects in JSON include: latitude, longitude, generationtime_ms, utc_offset_seconds, timezone, timezone_abbreviation, elevation, daily_units, daily 
## 
## Rows: 23,376
## Columns: 21
## $ date                       <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time                       <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode                <fct> 71, 63, 53, 3, 75, 75, 73, 3, 3, 51, 51, 55…
## $ temperature_2m_max         <dbl> 8.6, 8.8, 7.4, 3.7, 2.3, 1.3, 4.0, 6.3, 8.3…
## $ temperature_2m_min         <dbl> -1.4, 5.7, 0.0, -1.1, -1.3, -0.8, -0.2, -4.…
## $ apparent_temperature_max   <dbl> 4.5, 7.2, 5.1, -0.2, -0.5, -1.2, 0.1, 3.9, …
## $ apparent_temperature_min   <dbl> -5.5, 1.1, -3.9, -4.5, -4.7, -4.2, -4.3, -8…
## $ precipitation_sum          <dbl> 0.4, 13.9, 0.9, 0.0, 12.3, 6.2, 1.8, 0.0, 0…
## $ rain_sum                   <dbl> 0.0, 13.9, 0.9, 0.0, 0.0, 0.6, 0.7, 0.0, 0.…
## $ snowfall_sum               <dbl> 0.28, 0.00, 0.00, 0.00, 8.68, 3.92, 0.77, 0…
## $ precipitation_hours        <dbl> 4, 20, 2, 0, 15, 18, 8, 0, 0, 3, 5, 4, 9, 1…
## $ sunrise                    <dttm> 1960-01-01 07:58:00, 1960-01-02 07:58:00, …
## $ sunset                     <dttm> 1960-01-01 17:42:00, 1960-01-02 17:43:00, …
## $ windspeed_10m_max          <dbl> 16.8, 19.2, 17.7, 9.4, 9.2, 8.7, 14.3, 11.6…
## $ windgusts_10m_max          <dbl> 30.2, 50.8, 37.4, 24.8, 18.4, 17.6, 26.6, 2…
## $ winddirection_10m_dominant <int> 122, 183, 294, 258, 25, 30, 238, 211, 183, …
## $ shortwave_radiation_sum    <dbl> 5.85, 1.70, 6.13, 6.70, 2.12, 3.83, 10.38, …
## $ et0_fao_evapotranspiration <dbl> 1.07, 0.42, 0.87, 0.75, 0.31, 0.40, 0.86, 1…
## $ sunrise_chr                <chr> "1960-01-01T07:58", "1960-01-02T07:58", "19…
## $ sunset_chr                 <chr> "1960-01-01T17:42", "1960-01-02T17:43", "19…
## $ fct_winddir                <fct> 122, 183, 294, 258, 25, 30, 238, 211, 183, …
dfDaily
## # A tibble: 23,376 × 21
##    date       time       weathercode temperature_2m_max temperature_2m_min
##    <date>     <chr>      <fct>                    <dbl>              <dbl>
##  1 1960-01-01 1960-01-01 71                         8.6               -1.4
##  2 1960-01-02 1960-01-02 63                         8.8                5.7
##  3 1960-01-03 1960-01-03 53                         7.4                0  
##  4 1960-01-04 1960-01-04 3                          3.7               -1.1
##  5 1960-01-05 1960-01-05 75                         2.3               -1.3
##  6 1960-01-06 1960-01-06 75                         1.3               -0.8
##  7 1960-01-07 1960-01-07 73                         4                 -0.2
##  8 1960-01-08 1960-01-08 3                          6.3               -4.8
##  9 1960-01-09 1960-01-09 3                          8.3               -3.5
## 10 1960-01-10 1960-01-10 51                        14                  4.4
## # ℹ 23,366 more rows
## # ℹ 16 more variables: apparent_temperature_max <dbl>,
## #   apparent_temperature_min <dbl>, precipitation_sum <dbl>, rain_sum <dbl>,
## #   snowfall_sum <dbl>, precipitation_hours <dbl>, sunrise <dttm>,
## #   sunset <dttm>, windspeed_10m_max <dbl>, windgusts_10m_max <dbl>,
## #   winddirection_10m_dominant <int>, shortwave_radiation_sum <dbl>,
## #   et0_fao_evapotranspiration <dbl>, sunrise_chr <chr>, sunset_chr <chr>, …
# 3. Plot means for continuous variables
plotContVarMean(dfDaily, isMonthly=TRUE, titleLoc=useNameCity)

plotContVarMean(dfDaily, isMonthly=FALSE, titleLoc=useNameCity)

# 4. Plot means for categorical variables
plotCatVarMean(dfDaily, isMonthly=TRUE, titleLoc=useNameCity)

plotCatVarMean(dfDaily, isMonthly=FALSE, titleLoc=useNameCity)

# 5. Create boxplots for select variables
omCreateBoxPlot(dfDaily, keyVar="windspeed_10m_max", ymin=0, titleLoc=useNameCity)

omCreateBoxPlot(dfDaily, keyVar="precipitation_hours", ymin=0, titleLoc=useNameCity)

omCreateBoxPlot(dfDaily, keyVar="precipitation_sum", ymin=0, titleLoc=useNameCity)

omCreateBoxPlot(dfDaily, keyVar="temperature_2m_max", titleLoc=useNameCity)

omCreateBoxPlot(dfDaily, keyVar="temperature_2m_min", titleLoc=useNameCity)

# 6. Analyze ACF/PACF for select variables
omACFPACF(dfDaily, keyVar="temperature_2m_max")

## 
## ACF peaks
## [1] 365 736
## 
## ACF troughs
## [1] 180 549 914

omACFPACF(dfDaily, keyVar="precipitation_sum", smallACF=TRUE)

## 
## ACF peaks
##  [1]  15  34  45  80 114 142 156 186 206 227 240 262 275 292 313 343 357 372 386
## [20] 405 417 445 469 483 495 525 572 591 605 626 654 677 689 714 728 767 791 803
## [39] 826 838 856 884 902 931 951 975
## 
## ACF troughs
##  [1]  26  57  84  95 130 162 180 197 221 232 255 284 305 317 333 355 374 395 419
## [20] 430 446 471 503 521 535 548 568 584 608 642 667 682 700 712 726 761 785 808
## [39] 821 845 859 881 898 912 934 948 960 972

omACFPACF(dfDaily, keyVar="windspeed_10m_max", smallACF=TRUE)

## 
## ACF peaks
## [1] 203 349 362 382 552 723 737 901 939
## 
## ACF troughs
## [1] 144 174 189 359 535 550 892 904 924

# 7. Create boxplots for select variables (with lag-1, daily difference)
omCreateBoxPlot(dfDaily, keyVar="temperature_2m_max", chgLag=1, titleLoc=useNameCity)

omCreateBoxPlot(dfDaily, keyVar="windspeed_10m_max", chgLag=1, titleLoc=useNameCity)

omCreateBoxPlot(dfDaily, keyVar="precipitation_sum", chgLag=1, titleLoc=useNameCity)

# 8. Create daily mean +/- 1 SD and/or SEM for select variables
omDailyMeanSD(dfDaily, rollK=21, rollVars=c("temperature_2m_max", "windspeed_10m_max", "precipitation_sum"))

omDailyMeanSD(dfDaily, rollK=21, rollVars=c("temperature_2m_max", 
                                            "windspeed_10m_max", 
                                            "precipitation_sum"
                                            ), 
              makeSEM=TRUE
              )

A function is written to run all steps for a new city:

omRunAllSteps <- function(dfDaily=NULL, 
                          dlData=FALSE, 
                          abbCity=NULL, 
                          tzCity=NULL, 
                          useNameCity="New city", 
                          dbNameCity=useNameCity, 
                          returnDF=FALSE,
                          runStats=TRUE, 
                          mainStatVars=c("temperature_2m_max", "windspeed_10m_max", "precipitation_sum"), 
                          xtraStatVars=c("temperature_2m_min", "precipitation_hours"),
                          minZeroVars=c("windspeed_10m_max", "precipitation_sum", "precipitation_hours")
                          ) {
    
    # 1. Download data (if requested)
    if(isTRUE(dlData)) helperNewCityDailyDownload(cityName=dbNameCity, tz=tzCity, abb=abbCity)
    
    # 2. Load and process data (if not passed)
    if(is.null(dfDaily)) dfDaily <- createDailyDF(abb=abbCity)
    
    # Stop processing if requested
    if(!isTRUE(runStats)) {
        if(isTRUE(returnDF)) {
            return(dfDaily)
        } else {
            return(NULL)
        }
    }
    
    # 3. Plot means for continuous variables
    plotContVarMean(dfDaily, isMonthly=TRUE, titleLoc=useNameCity)
    plotContVarMean(dfDaily, isMonthly=FALSE, titleLoc=useNameCity)

    # 4. Plot means for categorical variables
    plotCatVarMean(dfDaily, isMonthly=TRUE, titleLoc=useNameCity)
    plotCatVarMean(dfDaily, isMonthly=FALSE, titleLoc=useNameCity)

    # 5. Create boxplots for select variables
    purrr::walk(.x=c(mainStatVars, xtraStatVars), 
                .f=function(x) omCreateBoxPlot(dfDaily, 
                                               keyVar=x, 
                                               ymin=if(x %in% minZeroVars) 0 else NA, 
                                               titleLoc=useNameCity
                                               )
                )
    
    # 6. Analyze ACF/PACF for select variables
    purrr::walk(.x=mainStatVars, 
                .f=function(x) omACFPACF(dfDaily, keyVar=x, smallACF=TRUE)
                )

    # 7. Create boxplots for select variables (with lag-1, daily difference)
    purrr::walk(.x=mainStatVars, 
                .f=function(x) omCreateBoxPlot(dfDaily, keyVar=x, chgLag=1, titleLoc=useNameCity)
                )
        
    # 8. Create daily mean +/- 1 SD and/or SEM for select variables
    omDailyMeanSD(dfDaily, rollK=21, rollVars=mainStatVars)
    omDailyMeanSD(dfDaily, rollK=21, rollVars=mainStatVars, makeSEM=TRUE)
    
    # Return file if requested
    if(isTRUE(returnDF)) return(dfDaily)
    
}

The function is tested solely for loading a dataset:

dfDailyMCI <- omRunAllSteps(abbCity="mci", returnDF=TRUE, runStats=FALSE)
## 
## Objects in JSON include: latitude, longitude, generationtime_ms, utc_offset_seconds, timezone, timezone_abbreviation, elevation, daily_units, daily 
## 
## Rows: 23,376
## Columns: 21
## $ date                       <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time                       <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode                <fct> 71, 3, 3, 3, 3, 0, 3, 2, 3, 51, 55, 63, 51,…
## $ temperature_2m_max         <dbl> 6.4, 5.5, -3.1, -0.6, -5.1, 5.6, 7.3, 8.6, …
## $ temperature_2m_min         <dbl> 0.8, -4.8, -9.6, -12.6, -10.6, -6.6, -1.1, …
## $ apparent_temperature_max   <dbl> 1.0, 0.3, -8.7, -5.9, -9.1, 0.2, 3.2, 4.9, …
## $ apparent_temperature_min   <dbl> -4.5, -11.6, -15.3, -17.6, -16.7, -12.1, -6…
## $ precipitation_sum          <dbl> 0.3, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0…
## $ rain_sum                   <dbl> 0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0…
## $ snowfall_sum               <dbl> 0.14, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
## $ precipitation_hours        <dbl> 2, 0, 0, 0, 0, 0, 0, 0, 0, 1, 7, 16, 1, 12,…
## $ sunrise                    <dttm> 1960-01-01 08:37:00, 1960-01-02 08:37:00, …
## $ sunset                     <dttm> 1960-01-01 18:05:00, 1960-01-02 18:06:00, …
## $ windspeed_10m_max          <dbl> 25.9, 24.6, 22.4, 14.0, 16.2, 18.8, 19.1, 1…
## $ windgusts_10m_max          <dbl> 53.3, 49.7, 47.5, 29.9, 35.6, 36.7, 36.7, 2…
## $ winddirection_10m_dominant <int> 153, 263, 306, 268, 346, 221, 232, 171, 193…
## $ shortwave_radiation_sum    <dbl> 2.66, 9.49, 9.46, 10.03, 6.87, 10.38, 10.13…
## $ et0_fao_evapotranspiration <dbl> 0.95, 1.25, 1.03, 0.88, 0.72, 1.56, 1.39, 1…
## $ sunrise_chr                <chr> "1960-01-01T08:37", "1960-01-02T08:37", "19…
## $ sunset_chr                 <chr> "1960-01-01T18:05", "1960-01-02T18:06", "19…
## $ fct_winddir                <fct> 153, 263, 306, 268, 346, 221, 232, 171, 193…

The function is tested solely for plots on an existing dataset:

omRunAllSteps(dfDaily=dfDailyMCI, useNameCity="Kansas City MO")

## 
## ACF peaks
## [1] 369 726
## 
## ACF troughs
## [1] 180 547 915

## 
## ACF peaks
##  [1] 140 158 177 202 214 238 373 462 477 489 511 533 561 580 612 628 722 735 788
## [20] 848 883 900 919 933 952 964 982
## 
## ACF troughs
##  [1] 125 142 160 180 208 221 240 319 486 506 547 589 624 638 840 855 875 897 910
## [20] 925 948 966 989

## 
## ACF peaks
##  [1]  23  34  53  67  87 115 147 166 187 207 226 243 262 277 288 349 361 374 385
## [20] 404 441 459 481 504 533 561 579 600 620 635 650 673 685 696 712 723 741 756
## [39] 767 785 820 836 881 913 925 948 962 974
## 
## ACF troughs
##  [1]  24  36  59  76  91 111 129 144 158 184 224 251 269 280 305 319 334 371 389
## [20] 402 420 462 485 508 531 550 563 585 604 628 658 678 731 743 777 799 823 842
## [39] 862 875 893 916 936 955 968 982

The function is tested solely for downloading new data:

omRunAllSteps(dlData=TRUE, 
              abbCity="msp", 
              tzCity="US/Central", 
              useNameCity="Minneapolis MN", 
              runStats=FALSE
              )
## 
## Daily metrics created from indices: weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration 
## 
## 
## Download URL:
## [1] "https://archive-api.open-meteo.com/v1/archive?latitude=44.96&longitude=-93.27&start_date=1960-01-01&end_date=2023-12-31&daily=weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration&timezone=US%2FCentral"
## 
## Objects in JSON include: latitude, longitude, generationtime_ms, utc_offset_seconds, timezone, timezone_abbreviation, elevation, daily_units, daily 
## 
## Rows: 23,376
## Columns: 21
## $ date                       <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time                       <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode                <fct> 73, 73, 3, 71, 3, 3, 73, 71, 71, 3, 51, 51,…
## $ temperature_2m_max         <dbl> -2.3, -1.6, -8.3, -13.4, -14.6, -2.8, 1.6, …
## $ temperature_2m_min         <dbl> -5.2, -7.1, -21.4, -27.0, -28.0, -12.4, -3.…
## $ apparent_temperature_max   <dbl> -7.7, -4.2, -14.4, -19.6, -21.7, -7.5, -4.1…
## $ apparent_temperature_min   <dbl> -10.2, -13.2, -27.3, -32.8, -33.9, -19.2, -…
## $ precipitation_sum          <dbl> 6.2, 6.2, 0.0, 0.3, 0.0, 0.0, 1.5, 0.5, 0.3…
## $ rain_sum                   <dbl> 0.0, 0.1, 0.0, 0.0, 0.0, 0.0, 0.1, 0.0, 0.0…
## $ snowfall_sum               <dbl> 4.34, 4.27, 0.00, 0.21, 0.00, 0.00, 0.98, 0…
## $ precipitation_hours        <dbl> 12, 15, 0, 3, 0, 0, 9, 3, 2, 0, 5, 4, 1, 0,…
## $ sunrise                    <dttm> 1960-01-01 08:51:00, 1960-01-02 08:51:00, …
## $ sunset                     <dttm> 1960-01-01 17:41:00, 1960-01-02 17:42:00, …
## $ windspeed_10m_max          <dbl> 18.7, 19.9, 19.5, 20.2, 25.8, 21.4, 25.0, 2…
## $ windgusts_10m_max          <dbl> 37.1, 40.7, 39.2, 40.7, 49.7, 40.0, 47.5, 4…
## $ winddirection_10m_dominant <int> 126, 322, 297, 244, 240, 245, 261, 299, 160…
## $ shortwave_radiation_sum    <dbl> 2.19, 3.71, 6.54, 6.30, 7.99, 5.99, 6.70, 7…
## $ et0_fao_evapotranspiration <dbl> 0.39, 0.39, 0.34, 0.26, 0.26, 0.42, 0.62, 0…
## $ sunrise_chr                <chr> "1960-01-01T08:51", "1960-01-02T08:51", "19…
## $ sunset_chr                 <chr> "1960-01-01T17:41", "1960-01-02T17:42", "19…
## $ fct_winddir                <fct> 126, 322, 297, 244, 240, 245, 261, 299, 160…
## NULL

The function is tested for creating a dataset from multiple previously processed cities:

omCityMap <- c("atl"="Atlanta GA", 
               "ord"="Chicago IL", 
               "dtw"="Detroit MI", 
               "msy"="New Orleans LA", 
               "den"="Denver CO"
               )

dfMultiCity <- map_dfr(.x=names(omCityMap), 
                       .f=function(x) omRunAllSteps(dlData=FALSE,runStats=FALSE, abbCity=x, returnDF=TRUE), 
                       .id="src"
                       ) %>%
    mutate(cityName=unname(omCityMap)[as.integer(src)])
## 
## Objects in JSON include: latitude, longitude, generationtime_ms, utc_offset_seconds, timezone, timezone_abbreviation, elevation, daily_units, daily 
## 
## Rows: 23,376
## Columns: 21
## $ date                       <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time                       <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode                <fct> 71, 61, 63, 53, 51, 53, 61, 2, 3, 3, 3, 3, …
## $ temperature_2m_max         <dbl> 2.6, 11.4, 14.7, 7.3, 7.1, 7.3, 7.4, 10.7, …
## $ temperature_2m_min         <dbl> 0.1, 1.5, 2.0, -0.1, 0.2, 4.5, 3.1, 0.4, -0…
## $ apparent_temperature_max   <dbl> -2.3, 9.2, 13.1, 2.8, 4.4, 6.1, 4.9, 6.9, 1…
## $ apparent_temperature_min   <dbl> -4.0, -3.1, -2.7, -4.7, -3.3, 1.4, -0.5, -4…
## $ precipitation_sum          <dbl> 3.0, 7.8, 7.3, 1.1, 0.6, 5.8, 8.7, 0.0, 0.0…
## $ rain_sum                   <dbl> 2.5, 7.8, 7.3, 1.1, 0.6, 5.8, 8.7, 0.0, 0.0…
## $ snowfall_sum               <dbl> 0.35, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
## $ precipitation_hours        <dbl> 13, 21, 5, 3, 3, 17, 8, 0, 0, 0, 0, 0, 0, 5…
## $ sunrise                    <dttm> 1960-01-01 07:42:00, 1960-01-02 07:42:00, …
## $ sunset                     <dttm> 1960-01-01 17:39:00, 1960-01-02 17:40:00, …
## $ windspeed_10m_max          <dbl> 20.9, 18.7, 23.8, 15.5, 9.2, 11.4, 19.2, 17…
## $ windgusts_10m_max          <dbl> 39.6, 41.4, 56.2, 44.6, 33.1, 41.8, 39.2, 3…
## $ winddirection_10m_dominant <int> 90, 106, 295, 318, 2, 69, 288, 313, 140, 24…
## $ shortwave_radiation_sum    <dbl> 2.38, 1.83, 10.18, 9.37, 3.88, 1.77, 5.34, …
## $ et0_fao_evapotranspiration <dbl> 0.57, 0.38, 1.34, 1.36, 0.65, 0.33, 0.76, 1…
## $ sunrise_chr                <chr> "1960-01-01T07:42", "1960-01-02T07:42", "19…
## $ sunset_chr                 <chr> "1960-01-01T17:39", "1960-01-02T17:40", "19…
## $ fct_winddir                <fct> 90, 106, 295, 318, 2, 69, 288, 313, 140, 24…
## 
## Objects in JSON include: latitude, longitude, generationtime_ms, utc_offset_seconds, timezone, timezone_abbreviation, elevation, daily_units, daily 
## 
## Rows: 23,376
## Columns: 21
## $ date                       <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time                       <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode                <fct> 3, 51, 3, 2, 3, 1, 3, 3, 51, 51, 61, 63, 53…
## $ temperature_2m_max         <dbl> 1.6, 4.9, -0.6, -5.9, -6.1, 1.0, 4.9, 1.6, …
## $ temperature_2m_min         <dbl> -3.5, -0.9, -7.5, -11.8, -11.2, -10.3, -3.2…
## $ apparent_temperature_max   <dbl> -4.3, -0.6, -6.9, -13.1, -13.6, -4.4, -1.3,…
## $ apparent_temperature_min   <dbl> -7.9, -7.0, -14.4, -18.4, -17.7, -17.9, -8.…
## $ precipitation_sum          <dbl> 0.0, 1.7, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.6…
## $ rain_sum                   <dbl> 0.0, 1.7, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.6…
## $ snowfall_sum               <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
## $ precipitation_hours        <dbl> 0, 11, 0, 0, 0, 0, 0, 0, 6, 1, 5, 24, 5, 10…
## $ sunrise                    <dttm> 1960-01-01 07:18:00, 1960-01-02 07:18:00, …
## $ sunset                     <dttm> 1960-01-01 16:29:00, 1960-01-02 16:30:00, …
## $ windspeed_10m_max          <dbl> 22.7, 27.6, 27.0, 29.1, 29.6, 32.1, 32.7, 3…
## $ windgusts_10m_max          <dbl> 40.0, 52.6, 44.6, 50.8, 48.2, 52.6, 56.5, 5…
## $ winddirection_10m_dominant <int> 142, 214, 268, 247, 261, 232, 234, 275, 185…
## $ shortwave_radiation_sum    <dbl> 7.45, 2.25, 4.58, 8.66, 9.09, 8.79, 5.86, 8…
## $ et0_fao_evapotranspiration <dbl> 0.95, 0.66, 1.06, 1.06, 1.04, 1.33, 1.23, 1…
## $ sunrise_chr                <chr> "1960-01-01T07:18", "1960-01-02T07:18", "19…
## $ sunset_chr                 <chr> "1960-01-01T16:29", "1960-01-02T16:30", "19…
## $ fct_winddir                <fct> 142, 214, 268, 247, 261, 232, 234, 275, 185…
## 
## Objects in JSON include: latitude, longitude, generationtime_ms, utc_offset_seconds, timezone, timezone_abbreviation, elevation, daily_units, daily 
## 
## Rows: 23,376
## Columns: 21
## $ date                       <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time                       <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode                <fct> 3, 55, 51, 3, 3, 3, 3, 51, 51, 51, 3, 63, 6…
## $ temperature_2m_max         <dbl> 0.4, 3.2, 2.7, -3.3, -6.0, -0.8, 4.1, 2.5, …
## $ temperature_2m_min         <dbl> -4.4, -2.5, -2.9, -8.4, -9.7, -10.7, -4.9, …
## $ apparent_temperature_max   <dbl> -4.4, -1.1, -1.8, -9.2, -13.1, -7.0, -3.1, …
## $ apparent_temperature_min   <dbl> -8.1, -7.4, -8.9, -14.9, -15.2, -18.0, -9.6…
## $ precipitation_sum          <dbl> 0.0, 7.6, 0.4, 0.0, 0.0, 0.0, 0.0, 0.4, 0.3…
## $ rain_sum                   <dbl> 0.0, 7.6, 0.4, 0.0, 0.0, 0.0, 0.0, 0.4, 0.3…
## $ snowfall_sum               <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
## $ precipitation_hours        <dbl> 0, 17, 4, 0, 0, 0, 0, 4, 3, 1, 0, 17, 6, 4,…
## $ sunrise                    <dttm> 1960-01-01 08:01:00, 1960-01-02 08:01:00, …
## $ sunset                     <dttm> 1960-01-01 17:10:00, 1960-01-02 17:10:00, …
## $ windspeed_10m_max          <dbl> 15.0, 21.5, 25.6, 23.0, 28.7, 30.3, 30.9, 3…
## $ windgusts_10m_max          <dbl> 30.2, 42.5, 48.2, 45.0, 54.4, 54.4, 57.6, 6…
## $ winddirection_10m_dominant <int> 109, 196, 261, 247, 257, 239, 223, 272, 161…
## $ shortwave_radiation_sum    <dbl> 6.09, 0.93, 4.91, 4.91, 8.07, 6.47, 4.41, 6…
## $ et0_fao_evapotranspiration <dbl> 0.55, 0.33, 0.81, 0.96, 1.04, 1.07, 1.01, 1…
## $ sunrise_chr                <chr> "1960-01-01T08:01", "1960-01-02T08:01", "19…
## $ sunset_chr                 <chr> "1960-01-01T17:10", "1960-01-02T17:10", "19…
## $ fct_winddir                <fct> 109, 196, 261, 247, 257, 239, 223, 272, 161…
## 
## Objects in JSON include: latitude, longitude, generationtime_ms, utc_offset_seconds, timezone, timezone_abbreviation, elevation, daily_units, daily 
## 
## Rows: 23,376
## Columns: 21
## $ date                       <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time                       <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode                <fct> 61, 63, 51, 3, 51, 63, 51, 3, 3, 51, 3, 3, …
## $ temperature_2m_max         <dbl> 15.5, 19.1, 17.2, 12.1, 17.9, 18.2, 10.3, 1…
## $ temperature_2m_min         <dbl> 10.9, 12.3, 8.6, 6.4, 11.3, 10.3, 6.1, 4.6,…
## $ apparent_temperature_max   <dbl> 13.6, 20.8, 15.8, 9.2, 20.1, 20.2, 7.7, 9.1…
## $ apparent_temperature_min   <dbl> 6.6, 9.8, 4.1, 2.7, 8.0, 7.3, 2.5, 1.7, 4.8…
## $ precipitation_sum          <dbl> 7.1, 18.5, 0.3, 0.0, 0.7, 26.0, 0.1, 0.0, 0…
## $ rain_sum                   <dbl> 7.1, 18.5, 0.3, 0.0, 0.7, 26.0, 0.1, 0.0, 0…
## $ snowfall_sum               <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ precipitation_hours        <dbl> 12, 12, 2, 0, 7, 19, 1, 0, 0, 1, 0, 0, 0, 8…
## $ sunrise                    <dttm> 1960-01-01 07:55:00, 1960-01-02 07:55:00, …
## $ sunset                     <dttm> 1960-01-01 18:10:00, 1960-01-02 18:11:00, …
## $ windspeed_10m_max          <dbl> 28.9, 18.8, 29.8, 16.2, 21.0, 19.7, 22.7, 1…
## $ windgusts_10m_max          <dbl> 53.6, 36.4, 50.0, 27.7, 34.6, 59.4, 39.2, 2…
## $ winddirection_10m_dominant <int> 69, 125, 345, 40, 81, 352, 324, 66, 87, 106…
## $ shortwave_radiation_sum    <dbl> 7.86, 4.91, 13.14, 13.63, 4.33, 1.05, 10.67…
## $ et0_fao_evapotranspiration <dbl> 1.05, 0.79, 2.22, 2.17, 1.14, 0.28, 1.30, 1…
## $ sunrise_chr                <chr> "1960-01-01T07:55", "1960-01-02T07:55", "19…
## $ sunset_chr                 <chr> "1960-01-01T18:10", "1960-01-02T18:11", "19…
## $ fct_winddir                <fct> 69, 125, 345, 40, 81, 352, 324, 66, 87, 106…
## 
## Objects in JSON include: latitude, longitude, generationtime_ms, utc_offset_seconds, timezone, timezone_abbreviation, elevation, daily_units, daily 
## 
## Rows: 23,376
## Columns: 21
## $ date                       <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time                       <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode                <fct> 75, 3, 1, 3, 1, 3, 1, 3, 3, 3, 3, 51, 73, 7…
## $ temperature_2m_max         <dbl> -4.5, -5.7, -7.4, -1.9, -1.5, 3.4, 5.4, 9.6…
## $ temperature_2m_min         <dbl> -11.0, -19.2, -22.5, -22.8, -20.8, -16.0, -…
## $ apparent_temperature_max   <dbl> -8.9, -10.0, -11.9, -6.2, -5.5, -0.3, 0.5, …
## $ apparent_temperature_min   <dbl> -15.8, -24.5, -28.3, -28.8, -26.4, -21.4, -…
## $ precipitation_sum          <dbl> 8.4, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0…
## $ rain_sum                   <dbl> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0…
## $ snowfall_sum               <dbl> 5.88, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
## $ precipitation_hours        <dbl> 13, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 3, 17,…
## $ sunrise                    <dttm> 1960-01-01 08:20:00, 1960-01-02 08:20:00, …
## $ sunset                     <dttm> 1960-01-01 17:45:00, 1960-01-02 17:46:00, …
## $ windspeed_10m_max          <dbl> 22.1, 9.8, 13.3, 14.5, 14.3, 13.4, 16.1, 17…
## $ windgusts_10m_max          <dbl> 36.0, 15.8, 21.2, 23.4, 21.2, 22.3, 25.9, 2…
## $ winddirection_10m_dominant <int> 15, 156, 154, 149, 175, 196, 166, 182, 188,…
## $ shortwave_radiation_sum    <dbl> 6.21, 10.29, 10.29, 11.02, 10.95, 10.69, 10…
## $ et0_fao_evapotranspiration <dbl> 0.51, 0.73, 0.71, 0.89, 0.90, 1.17, 1.26, 1…
## $ sunrise_chr                <chr> "1960-01-01T08:20", "1960-01-02T08:20", "19…
## $ sunset_chr                 <chr> "1960-01-01T17:45", "1960-01-02T17:46", "19…
## $ fct_winddir                <fct> 15, 156, 154, 149, 175, 196, 166, 182, 188,…
dfMultiCity %>% count(cityName)
## # A tibble: 5 × 2
##   cityName           n
##   <chr>          <int>
## 1 Atlanta GA     23376
## 2 Chicago IL     23376
## 3 Denver CO      23376
## 4 Detroit MI     23376
## 5 New Orleans LA 23376

Mean and SEM are plotted for temperature for each city, as well as the total:

nYears <- length(unique(year(dfMultiCity$date)))
divBy <- sqrt(nYears-1)

dfMultiCity %>%
    select(date, temperature_2m_max, cityName) %>%
    bind_rows(mutate(., cityName="Overall")) %>%
    group_by(cityName, date) %>%
    summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
    arrange(cityName, date) %>%
    mutate(doy=pmin(yday(date), 365)) %>%
    group_by(cityName) %>%
    helperRollingAgg(origVar="temperature_2m_max", newName="mu_r21", k=21) %>%
    group_by(cityName, doy) %>%
    mutate(sd=sd(temperature_2m_max)) %>%
    group_by(cityName) %>%
    helperRollingAgg(origVar="sd", newName="sd_r21", k=21) %>%
    ungroup() %>%
    select(cityName, doy, mu_r21, sd_r21) %>%
    pivot_longer(cols=-c(cityName, doy)) %>%
    na.omit() %>%
    group_by(cityName, doy, name) %>%
    summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
    pivot_wider(id_cols=c(cityName, doy), names_from="name") %>%
    ggplot(aes(x=doy)) + 
    geom_line(data=~filter(., cityName!="Overall"), 
              aes(y=mu_r21, group=cityName, color=cityName), 
              lwd=2
              ) + 
    geom_ribbon(data=~filter(., cityName!="Overall"), 
                aes(ymin=mu_r21-sd_r21/divBy, ymax=mu_r21+sd_r21/divBy, fill=cityName), 
                alpha=0.5
                ) +
    labs(x="Day of Year", 
         y=paste0("Rolling ", 21, "-day mean +/- 1 ", if(isTRUE(TRUE)) "SEM (approx)" else "sd"), 
         title=paste0("Rolling ", 
                      21, 
                      "-day mean +/- 1 rolling ", 
                      21, 
                      "-day ", 
                      if(isTRUE(TRUE)) "SEM (approx)" else "sd", "\nTemperature Max (C)")
         ) + 
    scale_color_discrete(NULL) + 
    scale_fill_discrete(NULL) + 
    geom_line(data=~filter(., cityName=="Overall"), aes(y=mu_r21, group=cityName), lwd=1, lty=2)

Mean and SEM are plotted for precipitation for each city, as well as the total:

nYears <- length(unique(year(dfMultiCity$date)))
divBy <- sqrt(nYears-1)

dfMultiCity %>%
    select(date, precipitation_sum, cityName) %>%
    bind_rows(mutate(., cityName="Overall")) %>%
    group_by(cityName, date) %>%
    summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
    arrange(cityName, date) %>%
    mutate(doy=pmin(yday(date), 365)) %>%
    group_by(cityName) %>%
    helperRollingAgg(origVar="precipitation_sum", newName="mu_r21", k=21) %>%
    group_by(cityName, doy) %>%
    mutate(sd=sd(precipitation_sum)) %>%
    group_by(cityName) %>%
    helperRollingAgg(origVar="sd", newName="sd_r21", k=21) %>%
    ungroup() %>%
    select(cityName, doy, mu_r21, sd_r21) %>%
    pivot_longer(cols=-c(cityName, doy)) %>%
    na.omit() %>%
    group_by(cityName, doy, name) %>%
    summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
    pivot_wider(id_cols=c(cityName, doy), names_from="name") %>%
    ggplot(aes(x=doy)) + 
    geom_line(data=~filter(., cityName!="Overall"), 
              aes(y=mu_r21, group=cityName, color=cityName), 
              lwd=2
              ) + 
    geom_ribbon(data=~filter(., cityName!="Overall"), 
                aes(ymin=mu_r21-sd_r21/divBy, ymax=mu_r21+sd_r21/divBy, fill=cityName), 
                alpha=0.5
                ) +
    labs(x="Day of Year", 
         y=paste0("Rolling ", 21, "-day mean +/- 1 ", if(isTRUE(TRUE)) "SEM (approx)" else "sd"), 
         title=paste0("Rolling ", 
                      21, 
                      "-day mean +/- 1 rolling ", 
                      21, 
                      "-day ", 
                      if(isTRUE(TRUE)) "SEM (approx)" else "sd", "\nPrecipitation (mm)")
         ) + 
    scale_color_discrete(NULL) + 
    scale_fill_discrete(NULL) + 
    geom_line(data=~filter(., cityName=="Overall"), aes(y=mu_r21, group=cityName), lwd=1, lty=2)

Mean and SEM are plotted for wind speed for each city, as well as the total:

nYears <- length(unique(year(dfMultiCity$date)))
divBy <- sqrt(nYears-1)

dfMultiCity %>%
    select(date, windspeed_10m_max, cityName) %>%
    bind_rows(mutate(., cityName="Overall")) %>%
    group_by(cityName, date) %>%
    summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
    arrange(cityName, date) %>%
    mutate(doy=pmin(yday(date), 365)) %>%
    group_by(cityName) %>%
    helperRollingAgg(origVar="windspeed_10m_max", newName="mu_r21", k=21) %>%
    group_by(cityName, doy) %>%
    mutate(sd=sd(windspeed_10m_max)) %>%
    group_by(cityName) %>%
    helperRollingAgg(origVar="sd", newName="sd_r21", k=21) %>%
    ungroup() %>%
    select(cityName, doy, mu_r21, sd_r21) %>%
    pivot_longer(cols=-c(cityName, doy)) %>%
    na.omit() %>%
    group_by(cityName, doy, name) %>%
    summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
    pivot_wider(id_cols=c(cityName, doy), names_from="name") %>%
    ggplot(aes(x=doy)) + 
    geom_line(data=~filter(., cityName!="Overall"), 
              aes(y=mu_r21, group=cityName, color=cityName), 
              lwd=2
              ) + 
    geom_ribbon(data=~filter(., cityName!="Overall"), 
                aes(ymin=mu_r21-sd_r21/divBy, ymax=mu_r21+sd_r21/divBy, fill=cityName), 
                alpha=0.5
                ) +
    labs(x="Day of Year", 
         y=paste0("Rolling ", 21, "-day mean +/- 1 ", if(isTRUE(TRUE)) "SEM (approx)" else "sd"), 
         title=paste0("Rolling ", 
                      21, 
                      "-day mean +/- 1 rolling ", 
                      21, 
                      "-day ", 
                      if(isTRUE(TRUE)) "SEM (approx)" else "sd", "\nMax Wind Speed (kph)")
         ) + 
    scale_color_discrete(NULL) + 
    scale_fill_discrete(NULL) + 
    geom_line(data=~filter(., cityName=="Overall"), aes(y=mu_r21, group=cityName), lwd=1, lty=2)

A function is written to create plots of continuous variables across cities:

omMultiCityCont <- function(df, varName, varDesc, divBy) {

    p1 <- df %>%
        select(all_of(c("date", varName, "cityName"))) %>%
        bind_rows(mutate(., cityName="Overall")) %>%
        group_by(cityName, date) %>%
        summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
        arrange(cityName, date) %>%
        mutate(doy=pmin(yday(date), 365)) %>%
        group_by(cityName) %>%
        helperRollingAgg(origVar=varName, newName="mu_r21", k=21) %>%
        group_by(cityName, doy) %>%
        mutate(sd=sd(get(varName))) %>%
        group_by(cityName) %>%
        helperRollingAgg(origVar="sd", newName="sd_r21", k=21) %>%
        ungroup() %>%
        select(cityName, doy, mu_r21, sd_r21) %>%
        pivot_longer(cols=-c(cityName, doy)) %>%
        na.omit() %>%
        group_by(cityName, doy, name) %>%
        summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
        pivot_wider(id_cols=c(cityName, doy), names_from="name") %>%
        ggplot(aes(x=doy)) + 
        geom_line(data=~filter(., cityName!="Overall"), 
                  aes(y=mu_r21, group=cityName, color=cityName), 
                  lwd=2
                  ) + 
        geom_ribbon(data=~filter(., cityName!="Overall"), 
                    aes(ymin=mu_r21-sd_r21/divBy, ymax=mu_r21+sd_r21/divBy, fill=cityName), 
                    alpha=0.5
                    ) +
        labs(x="Day of Year", 
             y=paste0("Rolling ", 21, "-day mean +/- 1 ", if(isTRUE(TRUE)) "SEM (approx)" else "sd"), 
             title=paste0("Rolling ", 
                          21, 
                          "-day mean +/- 1 rolling ", 
                          21, 
                          "-day ", 
                          if(isTRUE(TRUE)) "SEM (approx)" else "sd", "\n", varDesc)
             ) + 
        scale_color_discrete(NULL) + 
        scale_fill_discrete(NULL) + 
        geom_line(data=~filter(., cityName=="Overall"), aes(y=mu_r21, group=cityName), lwd=1, lty=2)
    
    p1
    
}

The function is tested on a new variable:

omMultiCityCont(dfMultiCity, 
                varName="shortwave_radiation_sum", 
                varDesc="Daily solar radiation (MJ)", 
                divBy=sqrt(length(unique(year(dfMultiCity$date)))-1)
                )

The function is tested on evapotranspiration:

omMultiCityCont(dfMultiCity, 
                varName="et0_fao_evapotranspiration", 
                varDesc="Daily sum of ET0 Reference Evapotranspiration", 
                divBy=sqrt(length(unique(year(dfMultiCity$date)))-1)
                )

The function is tested on snowfall:

omMultiCityCont(dfMultiCity, 
                varName="snowfall_sum", 
                varDesc="Daily average snowfall (cm)", 
                divBy=sqrt(length(unique(year(dfMultiCity$date)))-1)
                )

The function is tested on apparent temperature:

omMultiCityCont(dfMultiCity, 
                varName="apparent_temperature_max", 
                varDesc="Maximum Apparent Temperature (C)", 
                divBy=sqrt(length(unique(year(dfMultiCity$date)))-1)
                )

The function is tested for max-min temperature spread:

omMultiCityCont(dfMultiCity %>% mutate(dTemp=temperature_2m_max-temperature_2m_min), 
                varName="dTemp", 
                varDesc="Max-Min Temperature Spread (C)", 
                divBy=sqrt(length(unique(year(dfMultiCity$date)))-1)
                )

The function is tested for daily change in mean temperature:

omMultiCityCont(dfMultiCity %>% 
                    mutate(muTemp=0.5*(temperature_2m_max+temperature_2m_min)) %>%
                    arrange(cityName, date) %>%
                    group_by(cityName) %>%
                    mutate(dTemp=muTemp-lag(muTemp)) %>%
                    ungroup() %>%
                    na.omit(), 
                varName="dTemp", 
                varDesc="Daily Change in Mean Temperature (C)", 
                divBy=sqrt(length(unique(year(dfMultiCity$date)))-1)
                )

The function is tested on wind gusts:

omMultiCityCont(dfMultiCity, 
                varName="windgusts_10m_max", 
                varDesc="Daily Maximum Wind Gusts (kph)", 
                divBy=sqrt(length(unique(year(dfMultiCity$date)))-1)
                )

The function is tested on sunrise:

omMultiCityCont(dfMultiCity %>% 
                    mutate(sr=60*hour(sunrise)+minute(sunrise)), 
                varName="sr", 
                varDesc="Sunrise (minutes after midnight, local standard time)", 
                divBy=sqrt(length(unique(year(dfMultiCity$date)))-1)
                )

The function is tested on sunset:

omMultiCityCont(dfMultiCity %>% 
                    mutate(ss=60*hour(sunset)+minute(sunset)), 
                varName="ss", 
                varDesc="Sunset (minutes after midnight, local standard time)", 
                divBy=sqrt(length(unique(year(dfMultiCity$date)))-1)
                )

The function is tested on daylight length:

omMultiCityCont(dfMultiCity %>% 
                    mutate(dl=60*hour(sunset)+minute(sunset)-60*hour(sunrise)-minute(sunrise)), 
                varName="dl", 
                varDesc="Daylight length (minutes)", 
                divBy=sqrt(length(unique(year(dfMultiCity$date)))-1)
                )

The function is tested for daily change in daylight length:

omMultiCityCont(dfMultiCity %>% 
                    mutate(dl=60*hour(sunset)+minute(sunset)-60*hour(sunrise)-minute(sunrise)) %>%
                    arrange(cityName, date) %>%
                    group_by(cityName) %>%
                    mutate(ddl=dl-lag(dl)) %>%
                    ungroup() %>%
                    na.omit(), 
                varName="ddl", 
                varDesc="Daily Change in Daylight Length (minutes)", 
                divBy=sqrt(length(unique(year(dfMultiCity$date)))-1)
                )

The function is tested for daily change in sunrise:

omMultiCityCont(dfMultiCity %>% 
                    mutate(dl=60*hour(sunrise)+minute(sunrise)) %>%
                    arrange(cityName, date) %>%
                    group_by(cityName) %>%
                    mutate(ddl=dl-lag(dl)) %>%
                    ungroup() %>%
                    na.omit(), 
                varName="ddl", 
                varDesc="Daily Change in Sunrise (minutes)", 
                divBy=sqrt(length(unique(year(dfMultiCity$date)))-1)
                )

The function is tested for daily change in sunset:

omMultiCityCont(dfMultiCity %>% 
                    mutate(dl=60*hour(sunset)+minute(sunset)) %>%
                    arrange(cityName, date) %>%
                    group_by(cityName) %>%
                    mutate(ddl=dl-lag(dl)) %>%
                    ungroup() %>%
                    na.omit(), 
                varName="ddl", 
                varDesc="Daily Change in Sunset (minutes)", 
                divBy=sqrt(length(unique(year(dfMultiCity$date)))-1)
                )

The function is tested on solar noon:

omMultiCityCont(dfMultiCity %>% 
                    mutate(sn=30*hour(sunrise)+0.5*minute(sunrise)+30*hour(sunset)+0.5*minute(sunset)), 
                varName="sn", 
                varDesc="Solar noon (minutes after midnight, local standard time)", 
                divBy=sqrt(length(unique(year(dfMultiCity$date)))-1)
                )

The function is tested on change in solar noon:

omMultiCityCont(dfMultiCity %>% 
                    mutate(sn=30*hour(sunrise)+0.5*minute(sunrise)+30*hour(sunset)+0.5*minute(sunset)) %>%
                    arrange(cityName, date) %>%
                    group_by(cityName) %>%
                    mutate(dsn=sn-lag(sn)) %>%
                    ungroup() %>%
                    na.omit(), 
                varName="dsn", 
                varDesc="Daily change in solar noon (minutes)", 
                divBy=sqrt(length(unique(year(dfMultiCity$date)))-1)
                )

Dominant wind direction is explored:

dfMultiCity %>% 
    mutate(wd45=(45*round(winddirection_10m_dominant/45))%%360, 
           fct_month=factor(month.abb[month(date)], levels=month.abb)
           ) %>%
    count(cityName, wd45) %>% 
    ggplot(aes(x=wd45, y=n)) + 
    geom_col(fill="lightblue") +
    facet_wrap(~cityName) + 
    coord_polar(start=-pi/8) +
    annotate("point", x=0, y=0) + 
    theme(axis.text.y=element_blank(), axis.ticks.y=element_blank()) +
    scale_x_continuous(lim=c(-22.5, 337.5), breaks=seq(0, 360, by=45)) +
    labs(title="Predominant daily wind direction by city (1960-2023)", 
         subtitle="Rounded to nearest 45 degrees", 
         x=NULL, 
         y=NULL
         )

Dominant wind direction is explored by month:

dfMultiCity %>% 
    mutate(wd45=(45*round(winddirection_10m_dominant/45))%%360, 
           fct_month=factor(month.abb[month(date)], levels=month.abb)
           ) %>%
    count(fct_month, wd45) %>% 
    ggplot(aes(x=wd45, y=n)) + 
    geom_col(fill="lightblue") +
    facet_wrap(~fct_month) + 
    coord_polar(start=-pi/8) +
    annotate("point", x=0, y=0) + 
    theme(axis.text.y=element_blank(), axis.ticks.y=element_blank()) +
    scale_x_continuous(lim=c(-22.5, 337.5), breaks=seq(0, 360, by=45)) +
    labs(title="Predominant daily wind direction by month (5 cities, 1960-2023)", 
         subtitle="Rounded to nearest 45 degrees", 
         x=NULL, 
         y=NULL
         )

Dominant wind direction is explored by month, by proportion rather than by raw counts:

dfMultiCity %>% 
    mutate(wd45=(45*round(winddirection_10m_dominant/45))%%360, 
           fct_month=factor(month.abb[month(date)], levels=month.abb)
           ) %>%
    count(fct_month, wd45) %>% 
    group_by(fct_month) %>%
    mutate(pct=n/sum(n)) %>%
    ungroup() %>%
    ggplot(aes(x=wd45, y=pct)) + 
    geom_col(fill="lightblue") +
    facet_wrap(~fct_month) + 
    coord_polar(start=-pi/8) +
    annotate("point", x=0, y=0) + 
    theme(axis.text.y=element_blank(), axis.ticks.y=element_blank()) +
    scale_x_continuous(lim=c(-22.5, 337.5), breaks=seq(0, 360, by=45)) +
    labs(title="Predominant daily wind direction by month (5 cities, 1960-2023)", 
         subtitle="Rounded to nearest 45 degrees", 
         x=NULL, 
         y=NULL
         )

Dominant wind direction is explored by month for a single city:

dfMultiCity %>% 
    mutate(wd45=(45*round(winddirection_10m_dominant/45))%%360, 
           fct_month=factor(month.abb[month(date)], levels=month.abb)
           ) %>%
    filter(cityName=="Chicago IL") %>%
    count(fct_month, wd45) %>% 
    group_by(fct_month) %>%
    mutate(pct=n/sum(n)) %>%
    ungroup() %>%
    ggplot(aes(x=wd45, y=pct)) + 
    geom_col(fill="lightblue") +
    facet_wrap(~fct_month) + 
    coord_polar(start=-pi/8) +
    annotate("point", x=0, y=0) + 
    theme(axis.text.y=element_blank(), axis.ticks.y=element_blank()) +
    scale_x_continuous(lim=c(-22.5, 337.5), breaks=seq(0, 360, by=45)) +
    labs(title="Predominant daily wind direction by month (Chicago IL, 1960-2023)", 
         subtitle="Rounded to nearest 45 degrees", 
         x=NULL, 
         y=NULL
         )

Dominant wind direction is explored by city for a single month:

dfMultiCity %>% 
    mutate(wd45=(45*round(winddirection_10m_dominant/45))%%360, 
           fct_month=factor(month.abb[month(date)], levels=month.abb)
           ) %>%
    filter(fct_month=="Jun") %>%
    count(cityName, wd45) %>% 
    group_by(cityName) %>%
    mutate(pct=n/sum(n)) %>%
    ungroup() %>%
    ggplot(aes(x=wd45, y=pct)) + 
    geom_col(fill="lightblue") +
    facet_wrap(~cityName) + 
    coord_polar(start=-pi/8) +
    annotate("point", x=0, y=0) + 
    theme(axis.text.y=element_blank(), axis.ticks.y=element_blank()) +
    scale_x_continuous(lim=c(-22.5, 337.5), breaks=seq(0, 360, by=45)) +
    labs(title="Predominant daily wind direction by city (June, 1960-2023)", 
         subtitle="Rounded to nearest 45 degrees", 
         x=NULL, 
         y=NULL
         )

Cities are explored for similarity of wind direction:

pctWindDirCity <- dfMultiCity %>% 
    mutate(wd45=(45*round(winddirection_10m_dominant/45))%%360, 
           fct_month=factor(month.abb[month(date)], levels=month.abb), 
           fct_wd45=factor(wd45, 
                           levels=seq(0, 315, by=45), 
                           labels=c("N", "NE", "E", "SE", "S", "SW", "W", "NW")
                           )
           ) %>%
    count(cityName, fct_wd45) %>% 
    group_by(cityName) %>%
    mutate(pct=n/sum(n)) %>%
    ungroup() %>%
    select(cityName, fct_wd45, pct) %>%
    pivot_wider(id_cols=c("cityName"), names_from="fct_wd45", values_from="pct")
pctWindDirCity
## # A tibble: 5 × 9
##   cityName            N     NE      E     SE      S    SW      W     NW
##   <chr>           <dbl>  <dbl>  <dbl>  <dbl>  <dbl> <dbl>  <dbl>  <dbl>
## 1 Atlanta GA     0.0620 0.0568 0.178  0.117  0.0959 0.114 0.150  0.226 
## 2 Chicago IL     0.112  0.129  0.0722 0.0730 0.154  0.201 0.157  0.101 
## 3 Denver CO      0.0945 0.0834 0.0802 0.132  0.319  0.158 0.0745 0.0581
## 4 Detroit MI     0.109  0.103  0.0768 0.0689 0.116  0.217 0.180  0.128 
## 5 New Orleans LA 0.114  0.100  0.147  0.188  0.151  0.113 0.0974 0.0890
reshape2::melt(as.matrix(pctWindDirCity %>% select(-cityName) %>% dist()),
               varnames = c("row", "col")
               ) %>%
    mutate(city1=pctWindDirCity$cityName[row], city2=pctWindDirCity$cityName[col]) %>%
    filter(row!=col) %>%
    ggplot(aes(x=city1, y=city2)) + 
    geom_tile(aes(fill=value)) + 
    scale_fill_continuous("Dist", low="green", high="red") + 
    geom_text(aes(label=round(value, 2))) + 
    labs(title="Distance based on predominant daily wind direction\n(1960-2023)", 
         x=NULL, 
         y=NULL,
         caption="Predominant daily wind rounded to nearest 45 degrees\n(N, NE, E, SE, S, SW, W, NW)"
         )

Months are explored for similarity of wind direction:

pctWindDirMonth <- dfMultiCity %>% 
    mutate(wd45=(45*round(winddirection_10m_dominant/45))%%360, 
           fct_month=factor(month.abb[month(date)], levels=month.abb), 
           fct_wd45=factor(wd45, 
                           levels=seq(0, 315, by=45), 
                           labels=c("N", "NE", "E", "SE", "S", "SW", "W", "NW")
                           )
           ) %>%
    count(fct_month, fct_wd45) %>% 
    group_by(fct_month) %>%
    mutate(pct=n/sum(n)) %>%
    ungroup() %>%
    select(fct_month, fct_wd45, pct) %>%
    pivot_wider(id_cols=c("fct_month"), names_from="fct_wd45", values_from="pct")
pctWindDirMonth
## # A tibble: 12 × 9
##    fct_month      N     NE      E     SE     S    SW      W     NW
##    <fct>      <dbl>  <dbl>  <dbl>  <dbl> <dbl> <dbl>  <dbl>  <dbl>
##  1 Jan       0.0851 0.0724 0.0884 0.0961 0.170 0.168 0.166  0.155 
##  2 Feb       0.109  0.0948 0.0996 0.0938 0.156 0.149 0.144  0.154 
##  3 Mar       0.118  0.100  0.106  0.125  0.167 0.128 0.116  0.140 
##  4 Apr       0.111  0.0992 0.105  0.139  0.171 0.135 0.111  0.128 
##  5 May       0.108  0.108  0.109  0.150  0.169 0.147 0.110  0.0981
##  6 Jun       0.0848 0.102  0.103  0.118  0.176 0.199 0.129  0.0870
##  7 Jul       0.0824 0.0848 0.0803 0.100  0.147 0.225 0.179  0.102 
##  8 Aug       0.0877 0.0981 0.121  0.108  0.155 0.199 0.138  0.0935
##  9 Sep       0.104  0.125  0.169  0.126  0.156 0.136 0.0907 0.0934
## 10 Oct       0.114  0.0979 0.145  0.117  0.163 0.130 0.115  0.117 
## 11 Nov       0.0949 0.0824 0.107  0.109  0.192 0.149 0.132  0.134 
## 12 Dec       0.0822 0.0713 0.0974 0.107  0.187 0.161 0.151  0.143
reshape2::melt(as.matrix(pctWindDirMonth %>% select(-fct_month) %>% dist()),
               varnames = c("row", "col")
               ) %>%
    mutate(city1=pctWindDirMonth$fct_month[row], city2=pctWindDirMonth$fct_month[col]) %>%
    filter(row!=col) %>%
    ggplot(aes(x=city1, y=city2)) + 
    geom_tile(aes(fill=value)) + 
    scale_fill_continuous("Dist", low="green", high="red") + 
    geom_text(aes(label=round(value, 2))) + 
    labs(title="Distance based on predominant daily wind direction\n(5 cities, 1960-2023)", 
         x=NULL, 
         y=NULL,
         caption="Predominant daily wind rounded to nearest 45 degrees\n(N, NE, E, SE, S, SW, W, NW)"
         )

Month-city is explored for similarity of wind direction:

pctWindDirMonthCity <- dfMultiCity %>% 
    mutate(wd45=(45*round(winddirection_10m_dominant/45))%%360, 
           fct_month=factor(month.abb[month(date)], levels=month.abb), 
           fct_wd45=factor(wd45, 
                           levels=seq(0, 315, by=45), 
                           labels=c("N", "NE", "E", "SE", "S", "SW", "W", "NW")
                           )
           ) %>%
    count(cityName, fct_month, fct_wd45) %>% 
    group_by(cityName, fct_month) %>%
    mutate(pct=n/sum(n)) %>%
    ungroup() %>%
    select(cityName, fct_month, fct_wd45, pct) %>%
    pivot_wider(id_cols=c("cityName", "fct_month"), names_from="fct_wd45", values_from="pct")
pctWindDirMonthCity
## # A tibble: 60 × 10
##    cityName   fct_month      N     NE     E     SE      S     SW      W    NW
##    <chr>      <fct>      <dbl>  <dbl> <dbl>  <dbl>  <dbl>  <dbl>  <dbl> <dbl>
##  1 Atlanta GA Jan       0.0509 0.0368 0.137 0.0912 0.0917 0.102  0.176  0.315
##  2 Atlanta GA Feb       0.0476 0.0459 0.146 0.0973 0.0918 0.114  0.160  0.296
##  3 Atlanta GA Mar       0.0570 0.0287 0.134 0.122  0.130  0.126  0.139  0.265
##  4 Atlanta GA Apr       0.0427 0.0339 0.108 0.131  0.149  0.155  0.155  0.225
##  5 Atlanta GA May       0.0726 0.0514 0.141 0.139  0.111  0.138  0.163  0.184
##  6 Atlanta GA Jun       0.0682 0.0599 0.158 0.120  0.0990 0.146  0.192  0.156
##  7 Atlanta GA Jul       0.0539 0.0423 0.132 0.0958 0.0847 0.178  0.231  0.182
##  8 Atlanta GA Aug       0.0595 0.0922 0.235 0.119  0.0781 0.124  0.135  0.157
##  9 Atlanta GA Sep       0.0833 0.114  0.326 0.138  0.0568 0.0521 0.0719 0.159
## 10 Atlanta GA Oct       0.0917 0.0751 0.271 0.119  0.0635 0.0570 0.0988 0.224
## # ℹ 50 more rows
reshape2::melt(as.matrix(pctWindDirMonthCity %>% select(-fct_month, -cityName) %>% dist()),
               varnames = c("row", "col")
               ) %>%
    mutate(city1=pctWindDirMonthCity$cityName[row], 
           city2=pctWindDirMonthCity$cityName[col], 
           month1=pctWindDirMonthCity$fct_month[row], 
           month2=pctWindDirMonthCity$fct_month[col]
           ) %>%
    tibble::as_tibble() %>%
    select(city1, month1, city2, month2, dist=value, everything()) %>%
    filter(row<col) %>%
    arrange(dist) %>%
    filter(row_number()<=10 | row_number()>(max(row_number())-10))
## # A tibble: 20 × 7
##    city1          month1 city2          month2   dist   row   col
##    <chr>          <fct>  <chr>          <fct>   <dbl> <int> <int>
##  1 Denver CO      Jan    Denver CO      Dec    0.0220    25    36
##  2 Detroit MI     Jun    Detroit MI     Aug    0.0268    42    44
##  3 Atlanta GA     Feb    Atlanta GA     Dec    0.0295     2    12
##  4 Detroit MI     Jan    Detroit MI     Dec    0.0301    37    48
##  5 Denver CO      Feb    Denver CO      Oct    0.0304    26    34
##  6 Atlanta GA     Jan    Atlanta GA     Feb    0.0310     1     2
##  7 New Orleans LA Jan    New Orleans LA Dec    0.0313    49    60
##  8 Detroit MI     Apr    Detroit MI     May    0.0325    40    41
##  9 Denver CO      Jan    Denver CO      Nov    0.0329    25    35
## 10 Detroit MI     Jul    Detroit MI     Aug    0.0373    43    44
## 11 Chicago IL     Jan    Denver CO      Dec    0.465     13    36
## 12 Denver CO      Jan    Detroit MI     Jan    0.467     25    37
## 13 Atlanta GA     Feb    Denver CO      Dec    0.472      2    36
## 14 Atlanta GA     Oct    Denver CO      Jan    0.476     10    25
## 15 Atlanta GA     Jan    Denver CO      Jan    0.478      1    25
## 16 Atlanta GA     Oct    Denver CO      Dec    0.482     10    36
## 17 Denver CO      Dec    Detroit MI     Jan    0.483     36    37
## 18 Atlanta GA     Sep    Denver CO      Jan    0.484      9    25
## 19 Atlanta GA     Jan    Denver CO      Dec    0.487      1    36
## 20 Atlanta GA     Sep    Denver CO      Dec    0.489      9    36